call_end

    • chevron_right

      Michael Meeks: 2025-03-04 Tuesday

      news.movim.eu / PlanetGnome • 4 March

    • Quickish planning call, customer call, catch up with Karen, Niels, poked at some debugging.
    • Supervised E. making her Quorridor board - managed to get an initial board completed, lots of slots cut with an improvised dado blade. /ul>
    • wifi_tethering open_in_new

      This post is public

      meeksfamily.uk /~michael/blog/2025-03-04.html

    • chevron_right

      Andy Wingo: whippet lab notebook: on untagged mallocs

      news.movim.eu / PlanetGnome • 4 March • 6 minutes

    Salutations, populations. Today’s note is more of a work-in-progress than usual; I have been finally starting to look at getting Whippet into Guile , and there are some open questions.

    inventory

    I started by taking a look at how Guile uses the Boehm-Demers-Weiser collector ‘s API, to make sure I had all my bases covered for an eventual switch to something that was not BDW. I think I have a good overview now, and have divided the parts of BDW-GC used by Guile into seven categories.

    implicit uses

    Firstly there are the ways in which Guile’s run-time and compiler depend on BDW-GC’s behavior, without actually using BDW-GC’s API. By this I mean principally that we assume that any reference to a GC-managed object from any thread’s stack will keep that object alive. The same goes for references originating in global variables, or static data segments more generally. Additionally, we rely on GC objects not to move: references to GC-managed objects in registers or stacks are valid across a GC boundary, even if those references are outside the GC-traced graph: all objects are pinned.

    Some of these “uses” are internal to Guile’s implementation itself, and thus amenable to being changed, albeit with some effort. However some escape into the wild via Guile’s API, or, as in this case, as implicit behaviors; these are hard to change or evolve, which is why I am putting my hopes on Whippet’s mostly-marking collector , which allows for conservative roots.

    defensive uses

    Then there are the uses of BDW-GC’s API, not to accomplish a task, but to protect the mutator from the collector: GC_call_with_alloc_lock , explicitly enabling or disabling GC, calls to sigmask that take BDW-GC’s use of POSIX signals into account, and so on. BDW-GC can stop any thread at any time, between any two instructions; for most users is anodyne, but if ever you use weak references, things start to get really gnarly.

    Of course a new collector would have its own constraints, but switching to cooperative instead of pre-emptive safepoints would be a welcome relief from this mess. On the other hand, we will require client code to explicitly mark their threads as inactive during calls in more cases, to ensure that all threads can promptly reach safepoints at all times. Swings and roundabouts?

    precise tracing

    Did you know that the Boehm collector allows for precise tracing? It does! It’s slow and truly gnarly, but when you need precision, precise tracing nice to have. (This is the GC_new_kind interface.) Guile uses it to mark Scheme stacks, allowing it to avoid treating unboxed locals as roots. When it loads compiled files, Guile also adds some sliced of the mapped files to the root set. These interfaces will need to change a bit in a switch to Whippet but are ultimately internal, so that’s fine.

    What is not fine is that Guile allows C users to hook into precise tracing, notably via scm_smob_set_mark . This is not only the wrong interface, not allowing for copying collection, but these functions are just truly gnarly. I don’t know know what to do with them yet; are our external users ready to forgo this interface entirely? We have been working on them over time, but I am not sure.

    reachability

    Weak references, weak maps of various kinds: the implementation of these in terms of BDW’s API is incredibly gnarly and ultimately unsatisfying. We will be able to replace all of these with ephemerons and tables of ephemerons, which are natively supported by Whippet. The same goes with finalizers.

    The same goes for constructs built on top of finalizers, such as guardians ; we’ll get to reimplement these on top of nice Whippet-supplied primitives. Whippet allows for resuscitation of finalized objects, so all is good here.

    misc

    There is a long list of miscellanea: the interfaces to explicitly trigger GC, to get statistics, to control the number of marker threads, to initialize the GC; these will change, but all uses are internal, making it not a terribly big deal.

    I should mention one API concern, which is that BDW’s state is all implicit. For example, when you go to allocate, you don’t pass the API a handle which you have obtained for your thread, and which might hold some thread-local freelists; BDW will instead load thread-local variables in its API. That’s not as efficient as it could be and Whippet goes the explicit route, so there is some additional plumbing to do.

    Finally I should mention the true miscellaneous BDW-GC function: GC_free . Guile exposes it via an API, scm_gc_free . It was already vestigial and we should just remove it, as it has no sensible semantics or implementation.

    allocation

    That brings me to what I wanted to write about today, but am going to have to finish tomorrow: the actual allocation routines. BDW-GC provides two, essentially: GC_malloc and GC_malloc_atomic . The difference is that “atomic” allocations don’t refer to other GC-managed objects, and as such are well-suited to raw data. Otherwise you can think of atomic allocations as a pure optimization, given that BDW-GC mostly traces conservatively anyway.

    From the perspective of a user of BDW-GC looking to switch away, there are two broad categories of allocations, tagged and untagged.

    Tagged objects have attached metadata bits allowing their type to be inspected by the user later on. This is the happy path! We’ll be able to write a gc_trace_object function that takes any object, does a switch on, say, some bits in the first word, dispatching to type-specific tracing code. As long as the object is sufficiently initialized by the time the next safepoint comes around, we’re good, and given cooperative safepoints, the compiler should be able to ensure this invariant.

    Then there are untagged allocations. Generally speaking, these are of two kinds: temporary and auxiliary. An example of a temporary allocation would be growable storage used by a C run-time routine, perhaps as an unbounded-sized alternative to alloca . Guile uses these a fair amount, as they compose well with non-local control flow as occurring for example in exception handling.

    An auxiliary allocation on the other hand might be a data structure only referred to by the internals of a tagged object, but which itself never escapes to Scheme, so you never need to inquire about its type; it’s convenient to have the lifetimes of these values managed by the GC, and when desired to have the GC automatically trace their contents. Some of these should just be folded into the allocations of the tagged objects themselves, to avoid pointer-chasing. Others are harder to change, notably for mutable objects. And the trouble is that for external users of scm_gc_malloc , I fear that we won’t be able to migrate them over, as we don’t know whether they are making tagged mallocs or not.

    what is to be done?

    One conventional way to handle untagged allocations is to manage to fit your data into other tagged data structures; V8 does this in many places with instances of FixedArray, for example, and Guile should do more of this. Otherwise, you make new tagged data types. In either case, all auxiliary data should be tagged.

    I think there may be an alternative, which would be just to support the equivalent of untagged GC_malloc and GC_malloc_atomic ; but for that, I am out of time today, so type at y’all tomorrow. Happy hacking!

    • wifi_tethering open_in_new

      This post is public

      wingolog.org /archives/2025/03/04/whippet-lab-notebook-on-untagged-mallocs

    • chevron_right

      Aryan Kaushik: Create Custom System Call on Linux 6.8

      news.movim.eu / PlanetGnome • 28 February • 4 minutes

    Ever wanted to create a custom system call? Whether it be as an assignment, just for fun or learning more about the kernel, system calls are a cool way to learn more about our system.

    Note - crossposted from my article on Medium

    Why follow this guide?

    There are various guides on this topic, but the problem occurs due to the pace of kernel development. Most guides are now obsolete and throw a bunch of errors, hence I’m writing this post after going through the errors and solving them :)

    Set system for kernel compile

    On Red Hat / Fedora / Open Suse based systems, you can simply do

    Sudo dnf builddep kernel
    Sudo dnf install kernel-devel
    

    On Debian / Ubuntu based

    sudo apt-get install build-essential vim git cscope libncurses-dev libssl-dev bison flex
    

    Get the kernel

    Clone the kernel source tree, we’ll be cloning specifically the 6.8 branch but instructions should work on newer ones as well (till the kernel devs change the process again).

    git clone --depth=1 --branch v6.8 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
    

    Ideally, the cloned version should be equal to or higher than your current kernel version.

    You can check the current kernel version through the command

    uname -r
    

    Create the new syscall

    Perform the following

    cd linux
    make mrproper
    mkdir hello
    cd hello
    touch hello.c
    touch Makefile
    

    This will create a folder called “hello” inside the downloaded kernel source code, and create two files in it — hello.c with the syscall code and Makefile with the rules on compiling the same.

    Open hello.c in your favourite text editor and put the following code in it

    #include <linux/kernel.h>
    #include <linux/syscalls.h>
    SYSCALL_DEFINE0(hello) {
     pr_info("Hello World\n");
     return 0;
    }
    

    It prints “Hello World” in the kernel log.

    As per kernel.org docs

    " SYSCALL_DEFINEn() macro rather than explicitly. The ‘n’ indicates the number of arguments to the system call, and the macro takes the system call name followed by the (type, name) pairs for the parameters as arguments.”

    As we are just going to print, we use n=0

    Now add the following to the Makefile

    obj-y := hello.o
    

    Now

    cd ..
    cd include/linux/
    

    Open the file “syscalls.h” inside this directory, and add

    asmlinkage long sys_hello(void)
    

    captionless image

    This is a prototype for the syscall function we created earlier.

    Open the file “Kbuild” in the kernel root (cd ../..) and to the bottom of it add

    obj-y += hello/
    

    captionless image

    This tells the kernel build system to also compile our newly included folder.

    Once done, we then need to also add the syscall entry to the architecture-specific table.

    Each CPU architecture could have specific syscalls and we need to let them know for which architecture ours is made.

    For x86_64 the file is

    arch/x86/entry/syscalls/syscall_64.tbl
    

    Add your syscall entry there, keeping in mind to only use a free number and not use any numbers prohibited in the table comments.

    For me 462 was free, so I added the new entry as such

    462 common hello sys_hello
    

    captionless image

    Here 462 is mapped to our syscall which is common for both architectures, our syscall is named hello and its entry function is sys_hello.

    Compiling and installing the new kernel

    Perform the following commands

    NOTE: I in no way or form guarantee the safety, security, integrity and stability of your system by following this guide. All instructions listed here have been my own experience and doesn’t represent the outcome on your systems. Proceed with caution and care.

    Now that we have the legal stuff done, let’s proceed ;)

    cp /boot/config-$(uname -r) .config
    make olddefconfig
    make -j $(nproc)
    sudo make -j $(nproc) modules_install
    sudo make install
    

    Here we are copying the current booted kernel’s config file, asking the build system to use the same values as that and set default for anything else. Then we build the kernel with parallel processing based on the number of cores given by nproc. After which we install our custom kernel (at own risk).

    Kernel compilation takes a lot of time, so get a coffee or 10 and enjoy lines of text scrolling by on the terminal.

    It can take a few hours based on system speed so your mileage may vary. Your fan might also scream at this stage to keep temperatures under check (happened to me too).

    The fun part, using the new syscall

    Now that our syscall is baked into our kernel, reboot the system and make sure to select the new custom kernel from grub while booting

    captionless image

    Once booted, let’s write a C program to leverage the syscall

    Create a file, maybe “test.c” with the following content

    #include <stdio.h>
    #include <sys/syscall.h>
    #include <unistd.h>
    int main(void) {
      printf("%ld\n", syscall(462));
      return 0;
    }
    

    Here replace 462 with the number you chose for your syscall in the table.

    Compile the program and then run it

    make test
    chmod +x test
    ./test
    

    If all goes right, your terminal will print a “0” and the syscall output will be visible in the kernel logs.

    Access the logs by dmesg

    sudo dmesg | tail
    

    And voila, you should be able to see your syscall message printed there.

    Congratulations if you made it 🎉

    Please again remember the following points:

    • Compiling kernel takes a lot of time
    • The newly compiled kernel takes quite a bit of space so please ensure the availability
    • Linux kernel moves fast with code changes
    • wifi_tethering open_in_new

      This post is public

      www.aryank.in /posts/2025-02-28-linux-syscall/

    • chevron_right

      Thibault Martin: Prosthetics that don't betray

      news.movim.eu / PlanetGnome • 28 February • 13 minutes

    Tech takes a central place in our lives. Banking, and administrative tasks are happening more and more online. It's becoming increasingly difficult to get through life without a computer or a smartphone. They have become external organs necessary to live our life.

    Steve Jobs called the computer the bicycle for the mind . I believe computers & smartphones have become prosthetics, extensions of people that should unconditionally and entirely belong to them. We must produce devices and products the general public can trust.

    Microsoft, Google and Apple are three American companies that build the operating systems our computers, phones, and servers run on. This American hegemony on ubiquitous devices is dangerous both for all citizens worldwide, especially under an unpredictable, authoritarian American administration.

    Producing devices and an operating system for them is a gigantic task. Fortunately, it is not necessary to start from zero. In this post I share what I think is the best foundation for a respectful operating system and how to get it into European, and maybe American, hands. In a follow-up post I will talk more about distribution channels for older devices.

    [!warning] The rest of the world matters

    In this post I take a European-centric view. The rest of the world matters, but I am not familiar with their needs are nor how to address them.

    We're building prosthetics

    Prosthetics are extension of ourselves as individuals. They are deeply personal. We must ensure our devices & products are:

    • Transparent about what they do. They must not betray people and do things behind their backs. Our limbs do what we tell them. When they don't, it's considered a problem and we go to a physician to fix it.
    • Intuitive, documented, accessible and stable. People shouldn't have to re-learn how to do things they were used to doing. When they don't know how to do something, it must be easy for them to look it up or find someone to explain it to them. The devices must also be accessible and inclusive to reduce inequalities, instead of reinforcing them. Those requirements are a social matter, not a technical one.
    • Reliable, affordable, and repairable. Computers & smartphones must not allow discrimination based on social status and wealth. Everyone must have access to devices they can count on, and be able to maintain them in a good condition. This is also a social problem and not a technical one. It is worth noting that "the apps I need must be available for my system" is an often overlooked aspect of reliability, and "I don't have to install the system because it's bundled with my machine" is an important aspect of affordability.

    I believe that the GNOME project is one of the best placed to answer those challenges, especially when working in coordination with the excellent postmarketOS people who work on resurrecting older devices abandoned by their manufacturers. There is real stagnation in the computing industry that we must see as a social opportunity.

    Constraints are good

    GNOME is a computing environment aiming for simplicity and efficiency. Its opinionated approach benefits both users and developers:

    • From the user perspective , apps look and feel consistent and sturdy, and are easy to use thanks to well thought out defaults.
    • From the developer perspective , the opinionated human interface guidelines let them develop simpler, more predictable apps with less edge cases to test for.

    GNOME is a solid foundation to build respectful tech. It doesn't betray people by doing things behind their back. It aims for simplicity and stability, although it could use some more user research to back design decisions if there was funding to do so, like this has successfully been the case for GNOME 40 .

    Mobile matters

    GNOME's Human Interface Guidelines and development tooling make it easy to run GNOME apps on mobile devices. Some volunteers are also working on making GNOME Shell (the "desktop" view) render well on mobile devices.

    postmarketOS already offers it as one of the UIs you can install on your phone. With mobile taking over traditional computers usage, it is critical to consider the mobile side of computing too.

    Hackability and safety

    As an open source project, GNOME remains customizable by advanced users who know they are bringing unsupported changes, can break their system in the process, and deal with it. It doesn't make customization easy for those advanced users, because it doesn't optimize for them.

    The project also has its fair share of criticism, some valid, and some not. I agree that sometimes the project can be too opinionated and rigid, optimizing for extreme consistency at the expense of user experience. For example, while I agree that system trays are suboptimal, they're also a pattern people have been used to it for decades and removing them is very frustrating for many.

    But some criticism is also coming from people who want to tinker with their system and spend countless hours building a system that's the exact fit for their needs. Those are valid use cases, but GNOME is not built to serve them. GNOME aims to be easy to use for the general public, which includes people who are not tech-experts and don't want to be.

    We're actually building prototypes

    As mighty as the GNOME volunteers might be, there is still a long way before the general public can realistically use it. GNOME needs to become a fully fledged product shipped on mainstream devices, rather than an alternative system people install. It also needs to involve representatives of the people it intends to serve.

    You just need to simply be tech-savvy

    GNOME is not (yet) an end user product . It is a desktop environment that needs to be shipped as part of a Linux distribution. There are many distributions to chose from. They are not shipping the same version of GNOME, and some patch it more or less heavily. This kind of fragmentation is one of the main factors holding the Linux desktop back.

    The general public doesn't want to have to pick a distribution and bump into every edge cases that creates. They need a system that works predictably, that lets them install the apps they need, and that gives them safe ways to customize it as a user.

    That means they need a system that doesn't let them shoot themselves in the foot in the name of customizability, and that prevents them from doing some things unless they sign with their blood that they know it could make it unusable. I share Adrian Vovk's vision for A Desktop for All and I think it's the best way to productize GNOME and make it usable by the general public.

    People don't want to have to install an "alternative" system . They want to buy a computer or a smartphone and use it. For GNOME to become ubiquitous, it needs to be shipped on devices people can buy.

    For GNOME to really take off, it needs to become a system people can use both in their personal life and at work. It must become a compelling product in entreprise deployments, both to route enough money towards development and maintenance, to make it an attractive platform for vendors to build software for, and to make it an attractive platform for devices manufacturers to ship.

    What about the non tech-savvy?

    GNOME aims to build a computing platform everyone can trust. But it doesn't have a clear, scalable governance model with representatives of those it serves. GNOME has rudimentary governance to define what is part of the project and what is not thanks to its Release Team, but it is largely a do-ocracy as highlighted in the Governance page of GNOME's Handbook as well was in GNOME Designer Tobias Bernard's series Community Power .

    A do-ocracy is a very efficient way to onboard volunteers and empower people who can give away their free time to get things done fast. It is however not a great way to get work done on areas that matter to a minority who can't afford to give away free time or pay someone to work on it.

    The GNOME Foundation is indeed not GNOME's vendor today, and it doesn't contribute the bulk of the design and code of the project. It maintains the infrastructure (technical and organizational) the project builds on. A critical, yet little visible task.

    To be a meaningful, fair, inclusive project for more than engineers with spare time and spare computers, the project needs to improve in two areas:

    1. It needs a Product Committee to set a clear product direction so GNOME can meaningfully address the problems of its intended audience. The product needs a clear purpose, a clear audience, and a robust governance to enforce decisions. It needs a committee with representatives of the people it intends to serve, designers, and solution architects. Of course it also critically needs a healthy set of public and private organizations funding it.
    2. It needs a Development Team to implement the direction the committee has set. This means doing user research and design, technical design, implementing the software, doing advocacy work to promote the project to policymakers, manufacturers, private organizations' IT department and much more.

    [!warning] Bikeshedding is a real risk

    A Product Committee can be a useful structure for people to express their needs, draft a high-level and realistic solution with designers and solution architects, and test it. Designers and technical architects must remain in charge of designing and implementing the solution.

    The GNOME Foundation appears as a natural host for these organs, especially since it's already taking care of the assets of the project like its infrastructure and trademark. A separate organization could more easily pull the project in a direction that serves its own interests.

    Additionally, the GNOME Foundation taking on this kind of work doesn't conflict with the present do-ocracy, since volunteers and organizations could still work on what matters to them. But it would remain a major shift in the project's organization and would likely upset some volunteers who would feel that they have less control over the project.

    I believe this is a necessary step to make the public and private sector invest in the project, generate stable employment for people working on it, and ultimately make GNOME have a systemic, positive impact on society.

    [!warning] GNOME needs solution architects

    The GNOME community has designers who have a good product vision. It is also full of experts on their module, but has a shortage of people with a good technical overview of the project, who can turn product issues into technical ones at the scale of the whole project.

    So what now?

    "The year of the Linux desktop" has become a meme now for a reason. The Linux community, if such a nebulous thing exists, is very good at solving technical problems. But building a project bigger than ourselves and putting it in the hands of the millions of people who need it is not just a technical problem.

    Here are some critical next steps for the GNOME Community and Foundation to reclaim personal computing from the trifecta of tech behemoths, and fulfill an increasingly important need for democracies.

    Learn from experience

    Last year, a team of volunteers led by Sonny Piers and Tobias Bernard wrote a grant bid for the Sovereign Tech Fund, and got granted €1M. There are some major takeaways from this adventure.

    At risk of stating the obvious, money does solve problems! The team tackled significant technical issues not just for GNOME but for the free desktop in general. I urge organizations and governments that take their digital independence seriously to contribute meaningfully to the project.

    Uncertainty and understaffing have a cost . Everyone working on that budget was paid €45/hour, which is way lower than the market average. The project leads were only billing half-time on the project but worked much more than that in practice, and burnt out on it. Add some operational issues within the Foundation that wasn't prepared to properly support this initiative and you get massive drama that could have been avoided.

    Finally and unsurprisingly, one-offs are not sustainable . The Foundation needs to build sustainable revenue streams from a diverse portfolio to grow its team. A €1M grant is extremely generous from a single organization. It was a massive effort from the Sovereign Tech Agency, and a significant part of their 2024 budget. But it is also far from enough to sustain a project like GNOME if every volunteer was paid, let alone paid a fair wage.

    Tread carefully, change democratically

    Governance and funding are a chicken and egg problem. Funders won't send money to the project if they are not confident that the project will use it wisely, and if they can't weigh in on the project's direction. Without money to support the effort, only volunteers can set up the technical governance processes on their spare time.

    Governance changes must be done carefully though. Breaking the status quo without a plan comes with significant risks. It can demotivate current volunteers, make the project lose tractions for newcomers, and die before enough funding makes it to the project to sustain it. A lot of people have invested significant amounts of time and effort into GNOME, and this must be treated with respect.

    Build a focused MVP

    For the STF project, the GNOME Foundation relied on contractors and consultancies. To be fully operational and efficient it must get in a position of hiring people with the most critical skills. I believe right now the most critical profile is the solution architect one. With more revenue, developers and designers can join the team as it grows.

    But for that to happen, the Foundation needs to:

    1. Define who GNOME is for in priority, bearing in mind that "everyone" doesn't exist.
    2. Build a team of representatives of that audience, and a product roadmap: what problems do these people have that GNOME could solve, how could GNOME solve it for them, how could people get to using GNOME, and what tradeoffs would they have to make when using GNOME.
    3. Build the technical roadmap (the steps to make it happen).
    4. Fundraise to implement the roadmap, factoring in the roadmap creation costs.
    5. Implement, and test

    The Foundation can then build on this success and start engaging with policymakers, manufacturers, vendors to extent its reach.

    Alternative proposals

    The model proposed has a significant benefit: it gives clarity. You can give money to the GNOME Foundation to contribute to the maintenance and evolution of GNOME project, instead of only supporting its infrastructure costs. It unlocks the possibility to fund user research that would also benefit all the downstreams.

    It is possible to take the counter-point and argue that GNOME doesn't have to be an end-user product, but should remain an upstream that several organizations use for their own product and contribute to.

    The "upstream only" model is status-quo, and the main advantage of this model is that it lets contributing organizations focus on what they need the most. The GNOME Foundation would need to scale down to a minimum to only support the shared assets and infrastructure of the project and minimize its expenses. Another (public?) organization would need to tackle the problem of making GNOME a well integrated end-user product.

    In the "upstream only" model, there are two choices:

    • Either the governance of GNOME itself remains the same , a do-ocracy where whoever has the skills, knowledge and financial power to do so can influence the project.
    • Or the Community can introduce a more formal governance model to define what is part of GNOME and what is not, like Python PEPs and Rust's RFCs .

    It's an investment

    Building an operating system usable by the masses is a significant effort and requires a lot of expertise. It is tempting to think that since Microsoft, Google and Apple are already shipping several operating systems each, that we don't need one more.

    However, let's remember that these are all American companies, building proprietary ecosystems that they have complete control over. In these uncertain times, Europe must not treat the USA as a direct enemy, but the current administration makes it clear that it would be reckless to continue treating it as an ally.

    Building an international, transparent operating system that provides an open platform for people to use and for which developers can distribute apps will help secure EU's digital sovereignty and security, at a cost that wouldn't even make a dent in the budget. It's time for policymakers to take their responsibilities and not let America control the digital public space.

    • wifi_tethering open_in_new

      This post is public

      ergaster.org /posts/2025/02/28-prosthetics-that-dont-betray/

    • chevron_right

      Felipe Borges: GNOME is participating in Google Summer of Code 2025!

      news.movim.eu / PlanetGnome • 27 February

    The Google Summer of Code 2025 mentoring organizations have just been announced and we are happy that GNOME’s participation has been accepted!

    If you are interested in having a internship with GNOME, check gsoc.gnome.org for our project ideas and getting started information.

    • wifi_tethering open_in_new

      This post is public

      feborg.es /gnome-is-participating-in-google-summer-of-code-2025/

    • chevron_right

      Jussi Pakkanen: The price of statelessness is eternal waiting

      news.movim.eu / PlanetGnome • 27 February • 4 minutes

    Most CI systems I have seen have been stateless. That is, they start by getting a fresh Docker container (or building one from scratch), doing a Git checkout, building the thing and then throwing everything away. This is simple and matematically pure, but really slow. This approach is further driven by the fact that in cloud computing CPU time and network transfers are cheap but storage is expensive. Probably because the cloud vendor needs to take care of things like backups, they can't dispatch the task on any machine on the planet but instead on the one that already has the required state and so on.

    How much could you reduce resource usage (or, if you prefer, improve CI build speed) by giving up on statelessness? Let's find out by running some tests. To get a reasonably large code base I used LLVM. I did not actually use any cloud or Docker in the tests, but I simulated them on a local media PC. I used 16 cores to compile and 4 to link (any more would saturate the disk). Tests were not run.

    Baseline

    Creating a Docker container with all the build deps takes a few minutes. Alternatively you can prebuild it, but then you need to download a 1 GB image.

    Doing a full Git checkout would be wasteful. There are basically three different ways of doing a partial checkout: shallow clone, blobless and treeless. They take the following amount of time and space:

    • shallow: 1m, 259 MB
    • blobless: 2m 20s, 961 MB
    • treeless: 1m 46s, 473 MB
    Doing a full build from scratch takes 42 minutes.

    With CCache

    Using CCache in Docker is mostly a question of bind mounting a persistent directory in the container's cache directory. A from-scratch build with an up to date CCache takes 9m 30s.

    With stashed Git repo

    Just like the CCache dir, the Git checkout can also be persisted outside the container. Doing a git pull on an existing full checkout takes only a few seconds. You can even mount the repo dir read only to ensure that no state leaks from one build invocation to another.

    With Danger Zone

    One main thing a CI build ensures is that the code keeps on building when compiled from scratch. It is quite possible to have a bug in your build setup that manifests itself so that the build succeeds if a build directory has already been set up, but fails if you try to set it up from scratch. This was especially common back in ye olden times when people used to both write Makefiles by hand and to think that doing so was a good idea.

    Nowadays build systems are much more reliable and this is not such a common issue (though it can definitely still occur). So what if you would be willing to give up full from-scratch checks on merge requests? You could, for example, still have a daily build that validates that use case. For some organizations this would not be acceptable, but for others it might be reasonable tradeoff. After all, why should a CI build take noticeably longer than an incremental build on the developer's own machine. If anything it should be faster, since servers are a lot beefier than developer laptops. So let's try it.

    The implementation for this is the same as for CCache, you just persist the build directory as well. To run the build you do a Git update, mount the repo, build dir and optionally CCache dirs to the container and go.

    I tested this by doing a git pull on the repo and then doing a rebuild. There were a couple of new commits, so this should be representative of the real world workloads. An incremental build took 8m 30s whereas a from scratch rebuild using a fully up to date cache took 10m 30s.

    Conclusions

    The amount of wall clock time used for the three main approaches were:

    • Fully stateless
      • Image building: 2m
      • Git checkout: 1m
      • Build: 42m
      • Total : 45m
    • Cached from-scratch
      • Image building: 0m (assuming it is not "apt-get update"d for every build)
      • Git checkout: 0m
      • Build: 10m 30s
      • Total : 10m 30s
    • Fully cached
      • Image building: 0m
      • Git checkout: 0m
      • Build: 8m 30s
      • Total : 8m 30s
    Similarly the amount of data transferred was:

    • Fully stateless
      • Image: 1G
      • Checkout: 300 MB
    • Cached from-scratch:
      • Image: 0
      • Checkout: O(changes since last pull), typically a few kB
    • Fully cached
      • Image: 0
      • Checkout: O(changes since last pull)
    The differences are quite clear. Just by using CCache the build time drops by almost 80%. Persisting the build dir reduces the time by a further 15%. It turns out that having machines dedicated to a specific task can be a lot more efficient than rebuilding the universe from atoms every time. Fancy that.

    The final 2 minute improvement might not seem like that much, but on the other hand do you really want your developers to spend 2 minutes twiddling their thumbs for every merge request they create or update? I sure don't. Waiting for CI to finish is one of the most annoying things in software development.

    • wifi_tethering open_in_new

      This post is public

      nibblestew.blogspot.com /2025/02/the-price-of-statelessness-is-eternal.html

    • chevron_right

      Sebastian Pölsterl: scikit-survival 0.24.0 released

      news.movim.eu / PlanetGnome • 26 February • 4 minutes

    It’s my pleasure to announce the release of scikit-survival 0.24.0.

    A highlight of this release the addition of cumulative_incidence_competing_risks() which implements a non-parameteric estimator of the cumulative incidence function in the presence of competing risks. In addition, the release adds support for scikit-learn 1.6, including the support for missing values for ExtraSurvivalTrees .

    Analysis of Competing Risks

    In classical survival analysis, the focus is on the time until a specific event occurs. If no event is observed during the study period, the time of the event is considered censored. A common assumption is that censoring is non-informative, meaning that censored subjects have a similar prognosis to those who were not censored.

    Competing risks arise when each subject can experience an event due to one of $K$ ($K \geq 2$) mutually exclusive causes, termed competing risks. Thus, the occurrence of one event prevents the occurrence of other events. For example, after a bone marrow transplant, a patient might relapse or die from transplant-related causes (transplant-related mortality). In this case, death from transplant-related mortality precludes relapse.

    The bone marrow transplant data from Scrucca et al., Bone Marrow Transplantation (2007) includes data from 35 patients grouped into two cancer types: Acute Lymphoblastic Leukemia (ALL; coded as 0), and Acute Myeloid Leukemia (AML; coded as 1).

    from sksurv.datasets import load_bmt
    bmt_features, bmt_outcome = load_bmt()
    diseases = bmt_features["dis"].cat.rename_categories(
    {"0": "ALL", "1": "AML"}
    )
    diseases.value_counts().to_frame()
    
    dis count
    AML 18
    ALL 17

    During the follow-up period, some patients might experience a relapse of the original leukemia or die while in remission (transplant related death). The outcome is defined similarly to standard time-to-event data, except that the event indicator specifies the type of event, where 0 always indicates censoring.

    import pandas as pd
    status_labels = {
    0: "Censored",
    1: "Transplant related mortality",
    2: "Relapse",
    }
    risks = pd.DataFrame.from_records(bmt_outcome).assign(
    label=lambda x: x["status"].replace(status_labels)
    )
    risks["label"].value_counts().to_frame()
    
    label count
    Relapse 15
    Censored 11
    Transplant related mortality 9

    The table above shows the number of observations for each status.

    Non-parametric Estimator of the Cumulative Incidence Function

    If the goal is to estimate the probability of relapse, transplant-related death is a competing risk event. This means that the occurrence of relapse prevents the occurrence of transplant-related death, and vice versa. We aim to estimate curves that illustrate how the likelihood of these events changes over time.

    Let’s begin by estimating the probability of relapse using the complement of the Kaplan-Meier estimator. With this approach, we treat deaths as censored observations. One minus the Kaplan-Meier estimator provides an estimate of the probability of relapse before time $t$.

    import matplotlib.pyplot as plt
    from sksurv.nonparametric import kaplan_meier_estimator
    times, km_estimate = kaplan_meier_estimator(
    bmt_outcome["status"] == 1, bmt_outcome["ftime"]
    )
    plt.step(times, 1 - km_estimate, where="post")
    plt.xlabel("time $t$")
    plt.ylabel("Probability of relapsing before time $t$")
    plt.ylim(0, 1)
    plt.grid()
    
    bmt-kaplan-meier.svg

    However, this approach has a significant drawback: considering death as a censoring event violates the assumption that censoring is non-informative. This is because patients who died from transplant-related mortality have a different prognosis than patients who did not experience any event. Therefore, the estimated probability of relapse is often biased.

    The cause-specific cumulative incidence function (CIF) addresses this problem by estimating the cause-specific hazard of each event separately. The cumulative incidence function estimates the probability that the event of interest occurs before time $t$, and that it occurs before any of the competing causes of an event. In the bone marrow transplant dataset, the cumulative incidence function of relapse indicates the probability of relapse before time $t$, given that the patient has not died from other causes before time $t$.

    from sksurv.nonparametric import cumulative_incidence_competing_risks
    times, cif_estimates = cumulative_incidence_competing_risks(
    bmt_outcome["status"], bmt_outcome["ftime"]
    )
    plt.step(times, cif_estimates[0], where="post", label="Total risk")
    for i, cif in enumerate(cif_estimates[1:], start=1):
    plt.step(times, cif, where="post", label=status_labels[i])
    plt.legend()
    plt.xlabel("time $t$")
    plt.ylabel("Probability of event before time $t$")
    plt.ylim(0, 1)
    plt.grid()
    
    bmt-cumulative-incidence.svg

    The plot shows the estimated probability of experiencing an event at time $t$ for both the individual risks and for the total risk.

    Next, we want to to estimate the cumulative incidence curves for the two cancer types — acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) — to examine how the probability of relapse depends on the original disease diagnosis.

    _, axs = plt.subplots(2, 2, figsize=(7, 6), sharex=True, sharey=True)
    for j, disease in enumerate(diseases.unique()):
    mask = diseases == disease
    event = bmt_outcome["status"][mask]
    time = bmt_outcome["ftime"][mask]
    times, cif_estimates, conf_int = cumulative_incidence_competing_risks(
    event,
    time,
    conf_type="log-log",
    )
    for i, (cif, ci, ax) in enumerate(
    zip(cif_estimates[1:], conf_int[1:], axs[:, j]), start=1
    ):
    ax.step(times, cif, where="post")
    ax.fill_between(times, ci[0], ci[1], alpha=0.25, step="post")
    ax.set_title(f"{disease}: {status_labels[i]}", size="small")
    ax.grid()
    for ax in axs[-1, :]:
    ax.set_xlabel("time $t$")
    for ax in axs[:, 0]:
    ax.set_ylim(0, 1)
    ax.set_ylabel("Probability of event before time $t$")
    
    bmt-cumulative-incidence-by-diagnosis.svg

    The left column shows the estimated cumulative incidence curves (solid lines) for patients diagnosed with ALL, while the right column shows the curves for patients diagnosed with AML, along with their 95% pointwise confidence intervals. The plot indicates that the estimated probability of relapse at $t=40$ days is more than three times higher for patients diagnosed with ALL compared to AML.

    If you want to run the examples above yourself, you can execute them interactively in your browser using binder .

    • wifi_tethering open_in_new

      This post is public

      k-d-w.org /blog/2025/02/scikit-survival-0.24.0-released/

    • chevron_right

      Michael Meeks: 2025-02-26 Wednesday

      news.movim.eu / PlanetGnome • 26 February

    • Mail and chat chewage.
    • Published the next strip: umbrella organizations, and financial stewards for your project:
    • wifi_tethering open_in_new

      This post is public

      meeksfamily.uk /~michael/blog/2025-02-26.html

    • chevron_right

      Michael Meeks: 2025-02-25 Tuesday

      news.movim.eu / PlanetGnome • 25 February

    • Up extremely early; managed to clear out mail & admin before most people arrived, chat with Lily, new, faster format planning call.
    • Lunch. Plugged away at some profiling / performance features: what is it that burns CPU in the browser ? the Bitwarden browser plugin re-re-scanning the whole DOM on each keystroke looking for auto-complete magic in shadow DOMs - amazing.
    • wifi_tethering open_in_new

      This post is public

      meeksfamily.uk /~michael/blog/2025-02-25.html