people.kernel.org

Reader

Read the latest posts from people.kernel.org.

from Konstantin Ryabitsev

If you need to run a CentOS/RHEL 7 system where GnuPG is stuck on version 2.0, you can use the gnupg22-static package I'm maintaining for our own needs at Fedora COPR.

It installs into /opt/gnupg22 so it doesn't clash with the version of GnuPG installed and used by the system.

To start using the gnupg22-static binaries, you will need to first enable the COPR repository:

# yum install yum-plugin-copr
# yum copr enable icon/lfit
# yum install gnupg22-static

The static compilation process is not perfect, because it hardcodes some defaults to point to the buildroot locations (which don't exist in an installed RPM), so you will need to tell your gpg binary where its auxiliary programs live by adding the file called ~/.gnupg/gpg.conf-2.2 with the following content:

agent-program   /opt/gnupg22/bin/gpg-agent
dirmngr-program /opt/gnupg22/bin/dirmngr

Now you just need to add a couple of aliases to your ~/.bash_profile:

  alias gpg="/opt/gnupg22/bin/gpg"
  alias gpg2="/opt/gnupg22/bin/gpg"

Alternatively, you can list /opt/gnupg22/bin earlier in the path:

export PATH=/opt/gnupg22/bin:$PATH

You should now be able to enjoy GnuPG-2.2 features such as support for ECC keys and Web Key Directories.

 
Read more...

from Benson Leung

This issue came up recently for a high profile new gadget that has made the transition from Micro-USB to USB-C in its latest version, the Raspberry Pi 4. See the excellent blog post by Tyler (aka scorpia): https://www.scorpia.co.uk/2019/06/28/pi4-not-working-with-some-chargers-or-why-you-need-two-cc-resistors/

The short summary is that bad things (no charging) happens if the CC1 and CC2 pins are shorted together anywhere in a USB-C system that is not an audio accessory. When combined with more capable cables (handling SuperSpeed data, or 5A power) this configuration will cause compliant chargers to provide 0V instead of 5V to the Pi.

The Raspberry Pi folks made a very common USB-C hardware design mistake that I have personally encountered dozens of times in prototype hardware and in real gear that was sold to consumers.

What this unique about this case is that Raspberry Pi has posted schematics (thanks open hardware!) of their board that very clearly show the error.

Image

Excerpt from the reduced Pi4 Model B schematics, from https://www.scorpia.co.uk/wp-content/uploads/2019/06/image-300x292.png

Both of the CC pins in the Pi4 schematic above are tied together on one end of resistor R79, which is a 5.1 kΩ pulldown.

Contrast that to what the USB Type-C Specification mandates must be done in this case.

Image

USB Type-C's Sink Functional Model for CC1 and CC2, from USB Type-C Specification 1.4, Section 4.5.1.3.2

Each CC gets its own distinct Rd (5.1 kΩ), and it is important that they are distinct.

The Raspberry Pi team made two critical mistakes here. The first is that they designed this circuit themselves, perhaps trying to do something clever with current level detection, but failing to do it right. Instead of trying to come up with some clever circuit, hardware designers should simply copy the figure from the USB-C Spec exactly. The Figure 4–9 I posted above isn't simply a rough guideline of one way of making a USB-C receptacle. It's actually normative, meaning mandatory, required by the spec in order to call your system a compliant USB-C power sink. Just copy it.

The second mistake is that they didn't actually test their Pi4 design with advanced cables. I get it, the USB-C cable situation is confusing and messy, and I've covered it in detail here that there are numerous different cables. However, cables with e-marker chips (the kind that would cause problems with Pi4's mistake) are not that uncommon. Every single Apple MacBook since 2016 has shipped with a cable with an e-marker chip. The fact that no QA team inside of Raspberry Pi's organization caught this bug indicates they only tested with one kind (the simplest) of USB-C cable.

Raspberry Pi, you can do better. I urge you to correct your design as soon as you can so you can be USB-C compliant.

 
Read more...

from Konstantin Ryabitsev

Imagine you are at a conference somewhere and a person you run across tells you about a project that you find interesting.

“Here,” they say, “I can share it with you if you've got Decent. I think it's a few days stale, sorry — roaming costs here are crazy and I don't trust the hotel wifi.”

You do have the Decent app on your phone, so you scan the QR code and wait until your phone shows you the usual “replica complete” checkmark. You don't even know if your phones used NFC, Bluetooth, or WiFi for this little chat (probably not Wifi, because it kinda sucks at all conferences), but the data went straight from their phone to yours without needing to hit any other systems on the net.

When you get to your hotel room, you decide to check out the project details. You open the Decent app on your phone and it shows a short ID code (xPqat3z). You can use it to replicate the project straight from your phone. You open up your travel laptop and run:

$ decent replicate xPqat3z
Looking for xPqat3z...found project "fizzbuzz" on "My Phone".
Cloning git...done.
Replicating ssb chains...done.

Since both your laptop and your phone are Bluetooth-paired, you are able to grab the project straight from your phone without having to hit the net again. You poke around the git tree, try to get things to work, but something is not quite right. You do have a fairly old version of libsnafu installed, so maybe that's the cause?

You run “decent tui” and search for “libsnafu”, which shows that someone has hit that very same problem just two days ago and opened a new issue, but there is no follow up yet.

Or is there?

You exit the tui and run:

[~/fizzbuzz.git]$ decent pull
Found .decent with 5 pub servers.
Establishing the fastest pub server to use...done.
Joining pub.kernel.org:80 using the invite code...done.
Updating git...1c8b2e7..620b5e2...done.
Updating ssb chains...done.
- 1 new participant
- 15 new conversations
- 12 new commits in 2 branches
- 15 patch updates (2 new)
- 9 issue updates (1 new)

Ding, when you view the libsnafu issue again you see that there have been new updates since it was created 2 days ago (the person you replicated from did say their replica was a bit stale). There is even a proposed patch that is supposed to fix the library compatibility problem.

You hit “enter” on the patch to review it. Seems like a straightforward fix, and you're happy to see that there is already a couple of Tested-by from the usual CI bots, and a Reviewed-by from Taylor Thompson, the person you spoke with just earlier today — in fact, this Reviewed-by has a timestamp of only a few minutes ago. You guess Taylor is catching up on some work before dinner as well.

You type in “:apply snafutest” and decent automatically creates a snafutest branch, checks it out, and applies the proposed patch on top of it. Presto, fizzbuzz finally builds and works for you.

Being a good citizen, you decide to comment on the issue and add your own Tested-by. Since it's your first time participating in this project, you need to join first:

[~/fizzbuzz.git]$ decent join
Creating new SSB keypair...done.
Starting SSB replication agent...done.
Your name [Alex Anderson]: Alex Anderson
Device identifier: Travel laptop
Self-identifying as Alex Anderson (Travel Laptop)
(Required) Agree with terms stated in COPYING? [y/n/view]: y
Adding COPYING agreement record...done.
(Required) Agree with terms stated in COVENANT? [y/n/view]: y
Adding COVENANT agreement record...done.
Cross-certify? [SSB/PGP/Keybase/None]: PGP
Adding PGP public key record...done.
Adding signed feed-id record...
Enter PGP key passphrase: *********
Done.

Now that you've initialized your own developer chain, you can comment on the issue. You give it a thumbs-up, add your own Tested-by to the proposed patch, and join the #fizzbuzz-users and #fizzbuzz-dev channels. All of these actions are simply added records to your local SSB feed, which gets replicated to the pub server you'd joined earlier.

Other members of the project will automatically start getting your SSB feed updates either from the pub server they joined, or from other developers they are following. If a pub server becomes unavailable, anyone who's ever run “decent pull” will have replicas of all participating developer and bot feeds (which means full copies of all issues, patches, developer discussions, and CI reports — for the entirety of the project's existence). They can switch to a different pub server, set up their own, or just replicate between developers using the SSB gossip protocol that powers it all behind the scenes.

What is this magic?

The “decent” tool is fiction, but the SSB framework I'm describing is not. SSB stands for “Secure Scuttlebutt” (it's nautical slang for “gossip,” so please stop guffawing). SSB is a distributed gossip protocol that is built on the concept of replicating individual “sigchains,” which are very similar in concept to git. Each record references the hash of the previous record, plus SSB uses an ECC key to cryptographically sign every new entry, such that the entire chain is fully verifiable and attestable. Unless someone has access to the ECC secret key created at the beginning of the SSB chain, they would not be able to add new entries — and unless the chain has never been replicated anywhere, all entries are immutable (or the replication simply breaks if any of the existing records in it are modified).

The sigchains are only part of the story — SSB also offers a decentralized replication protocol that works hard to make sure that there is no single point of trust and no single point of failure. It is able to replicate using “pub” servers that merely work as convenient mediators, but are unnecessary for the overall health of the SSB fabric. SSB replication can be done peer-to-peer via local network, over the Internet, via Tor, sneakernet, or anything at all that is able to send and receive bits.

The end-tool on the client uses these individual feeds to assemble a narrative, using message-id cross-references to construct threads of conversations. SSB is envisioned as a fully-private and fully-decentralized social network where each participating individual shares an immutable activity record choosing how much to share publicly, how much to share with specific individuals, and how much to keep fully private.

I suggest we co-opt SSB for free software development to make it truly decentralized, self-archiving, and fully attestable in all developer interactions.

What problem are you solving?

If you've read my previous entries, you know that we've been working hard to archive and preserve mailing list discussions on lore.kernel.org. Mailing lists have served us well, but their downsides are very obvious:

  • email started out as decentralized, but the vast majority of it now flows through the same handful of providers (Google, Amazon, Microsoft). It's becoming more and more difficult to set up your own mail server and expect that mail you send out will be properly accepted and delivered by the “big guns.” Did you set up SPF? DKIM? DMARC? ARC-Seal? Is your IP blacklisted on RBL? Are you sure? Have you checked today?
  • Receiving email is also frustrating regardless whether you are using your own mail server or relying on one of the “big-gun” providers. If you're not running a spamchecker, you are probably wasting some part of your day dealing with spam. If you do filter your mail, then I hope you check your spam folder regularly (in which case you are still wasting the same amount of time on it, just more infrequently and in longer chunks at once).
  • Mailing list servers are single points of failure that have to send out amazing amounts of redundant data to all subscribers. If a mailing list provider becomes unavailable, this basically kills all project discussions until a new mailing list is set up and everyone re-subscribes. Usually, this also results in the loss of previous archives, because everyone assumes someone else has a full copy.
  • Mailing lists are lossy. If your mail starts bouncing for some reason (e.g. due to a full inbox), you usually end up unsubscribed and miss out on potentially important conversations. Unless you go back and check the archives, you may never become aware of what you missed.
  • Mail clients routinely mangle structured data. Anyone who's ever had to send out a patch is aware of the long-ish “how to configure your mail client so it doesn't mangle patches” section in the git docs.
  • Even if you do manage to successfully send patches, sending any other kind of structured data is the wild west. Bot reports, automated issue notifications, etc, attempt to present data as both human- and machine-readable and largely fail at both.
  • Everyone has pretty much given up on email encryption and attestation. PGP signatures in email are mostly treated like noise, because all clients kinda suck at PGP, and, more importantly, meaningful trust delegation is hard.

Duh, that's why nobody uses email

Extremely few projects still use email for software development. The Kernel is obviously an important exception to this, among a few others, and it's usually the kind of thing people like to mention to point out how behind-the-times kernel developers are. They should stop acting like such dinosaurs, get with the program and just start using Git..b already!

However, using Git..b obviously introduces both a single point of failure and a single point of trust. Git repositories may be decentralized, but commits are merely the final product of a lot of developer back-and-forth that ends up walled-in inside the beautiful Git..b garden. You can export your project from Git..b, but very few people bother to do so, and almost nobody does it on a regular basis.

If a maintainer steps away and all development moves to a different fork, the project loses part of its history that is not committed to git, because all its issues, CI test results, pull requests and conversations are now split between the old fork and the new fork. If the original developer has a personal crisis and purges their original repository, that part of the project history is now forever gone, even if the code remains.

Furthermore, if you've been around for a while, you've seen beautiful gardens come and go. Before Github there was Sourceforge, which at some point poisoned its beautiful wells by bundling adware with binary downloads. Google Code has come and gone, like most Google things do. Github has seen a significant exodus of projects to Gitlab after it got acquired by Microsoft, and there's certainly no guarantee that Gitlab won't be acquired by some other $TechGiant looking to spruce up its open-source community image.

Git is decentralized and self-archiving. Mailing lists... sort-of are — at least we are trying to keep them that way, but it's becoming more and more difficult. Even those projects that use mailing lists for patches may not use them for issue tracking or CI reports (for example, not all Bugzilla activity goes to mailing lists and Patchwork allows attaching CI reports directly to patches using its REST API).

I think it's way past due time for us to come up with a solution that would offer decentralized, self-archiving, fully attestable, “cradle-to-grave” development platform that covers all aspects of project development and not just the code. It must move us away from mailing lists, but avoid introducing single points of trust, authority, and failure.

And you think SSB is it?

I believe SSB offers us a usable framework that we can build on to achieve this goal. The concept of sigchains is very easy to convey due to their close resemblance to git, and the protocol's decentralized, mesh-like P2P replication nature is an important feature that will help us avoid introducing single points of failure. Every participant receives the full history of the project, same as currently every participant receives the full history of the project's code when they clone the git repository.

In SSB, every sigchain (“feed”) is tied to a single identity, which is usually a device belonging to a real person or it can be a project-specific feed used by a bot. Developers would naturally have multiple identities (“work laptop”, “phone”, “travel laptop”) and they can add new ones and abandon old ones as they add work environments or lose access to old devices. The feeds can be authenticated by each individual developer by cross-signing them with another identity framework (Keybase, PGP, etc), or they can remain fully pseudonymous.

The important part here is that once an identity is established, all records created by that identity are attestable to the same person or entity that were in the possession of the private ECC key at the time when that feed was created. When a maintainer applies someone's patches to their git tree, simply referencing the SBB record-id of the patch as part of the commit message is enough to provide a full immutable attestation chain for that code. It's like Signed-off-by on very powerful drugs.

Spammy feeds can be excluded using a blocklist file in the project repository, or a project can choose to have an allowlist explicitly listing authorized feeds (as long as they provide instructions on how to request addition to that list for the purposes of participation). Developers violating established community guidelines can be terminated from the project by adding a record indicating that their feeds should be replicated up to a specific entry and no further.

Since SSB relies on cryptographic keypairs by design, it is easy to set up fully private discussion groups that are end-to-end encrypted to all group participants. This makes it easy to discuss sensitive subjects like security vulnerabilities without needing to rely on any other means of communication or any other privacy tools outside of what is already provided by SSB.

(We'll ignore for the moment the fact that all implementations of SSB are written in Javascript/NPM — wait, don't go, hear me out! — since it uses standard crypto and the records themselves are json, everything about SSB is easily portable.)

Won't it raise the entry barrier?

I am acutely aware that such system would significantly raise the participation barrier. It's one thing to open an issue or send a pull request on Git..b, attach a patch to a Bugzilla entry, or send an email to a mailing list. We cannot expect that a “drive-by” contributor would install a client tool, replicate potentially tens of gigabytes of individual developer feeds, and create their own SSB identity simply to open an issue or submit a single patch. We would need full-featured web clients that would allow someone to browse projects in a similar fashion as they would browse them on Git..b, including viewing issues, submitting bug reports, and sending patches and pull requests.

The main distinction from Git..b here is that these web clients — let's call them “community bridges” — would merely be rich API translation endpoints contributing to fully distributed projects without locking developers into any walled gardens. They would be enabling collaboration without introducing central dependencies and points of failure and anyone choosing to participate on their own terms using their own free software stack (e.g. with our fictional decent tool) would be fully empowered to do so. In fact, Git..b and others can, too, become community bridges and allow their clients to participate in distributed projects for a truly “cross-garden” development experience.

(The web bridge would necessarily need to manage the contributor's identity to create their sigchain feed, but for “drive-by” contributions this would be a reasonable trade-off. Anyone can decide to switch to a local client and start a new identity at any time if they feel they can no longer trust the bridge they are using.)

I'm intrigued. What happens next?

I've been mulling this over for a while now, and this is really just the first brain dump of my thoughts. At this point, I need everyone to point out all the possible ways why this wouldn't work (much more appreciated if it's followed by a “but it might work if...”). I realize that there is no way to leave a comment here, but you can reach out to me via either floss.social or by emailing me at mricon@kernel.org.

The Linux development community has already given us a powerful distributed development tool in the form of git, and I firmly believe that it is able to deliver a git satellite tool that would encompass all aspects of project development beyond just code. I hope that by outlining my thoughts here I'll be able to jumpstart the necessary discussion that would eventually get us there.

PS: The title of this post references a talk by Greg KH titled Patches carved into stone tablets that goes into details on why kernel developers still use mailing lists for everything.

 
Read more...

from Konstantin Ryabitsev

The mail archiving system at lore.kernel.org uses public-inbox, which relies on git as the mechanism to store messages. This makes the entire archive collection very easy to replicate using grokmirror — the same tool we use to mirror git.kernel.org repositories across multiple worldwide frontends.

Setting up

It doesn't take a lot to get started. First, install grokmirror either from pip:

pip install grokmirror

or from your distro repositories:

dnf install python3-grokmirror

Next, you will need a config file and a location where you'll store your copy (keep in mind, at the time of writing all of the archives take up upwards of 20GB):

[lore.kernel.org]
# Use the erol mirror instead of lore directly
site = https://erol.kernel.org
manifest = https://erol.kernel.org/manifest.js.gz
toplevel = /path/to/your/local/archive
mymanifest = %(toplevel)s/manifest.js.gz
log = %(toplevel)s/pull.log
pull_threads = 2

Save this file into lore.conf and just run:

grok-pull -v -c lore.conf

The initial clone is going to take a long time, but after it is complete, consecutive runs of grok-pull will only update those repositories that have changed. If new repositories are added, they will be automatically cloned and added to your mirror of the archive.

Note: this by itself is not enough to run public-inbox on your local system, because there's a lot more to public-inbox than just git archives of all messages. For starters, the archives would need to be indexed into a series of sqlite3 and xapian databases, and the end-result would take up a LOT more than 20GB.

Future work

We are hoping to fund the development of a set of tools around public-inbox archives that would allow you to do cool stuff with submitted patches without needing to subscribe to LKML or any other list archived by lore.kernel.org. We expect this would be a nice feature that various CI bots can use to automatically discover and test patches without needing to bother about SMTP and incoming mail processing. If you would like to participate, please feel free to join the public-inbox development list.

 
Read more...

from Konstantin Ryabitsev

When news of the TCP_SACK panic vulnerability came out, we followed much of the world in applying the “sledgehammer” mitigation until updated kernels become available and we have a chance to perform updates and reboots:

echo 0 > /proc/sys/net/ipv4/tcp_sack

This has largely gone without any significant impact, except in one very specific configuration in AWS, where a TLS Listener is attached to an NLB. Normally, you'd use an ALB, not an NLB for https termination, but in this particular case we needed to serve both https and gitd traffic from the same IP address, which is difficult to achieve with an ALB.

Shortly after turning off tcp_sack, our external monitoring service started reporting intermittent availability alerts for the https endpoint — the check would fail once every 10 minutes or so, but then almost immediately recover. All other checks for that server were reporting green, so the sysops on-call staff ascribed this to the fact that the server is located in the ap-southeast-1 zone, which routinely sees high latency and blips in availability when monitored from North American endpoints.

However, after the situation had persisted for over 24 hours and we started receiving complaints from the clients, it became clear that something was going wrong with TLS termination. Since the only change done to those systems was the tcp_sack modification, that was the first thing we rolled back, with immediate dramatic results:

Availability graph

The graph is a bit misleading when it shows near-solid red, because availability appeared to be very intermittent and was more apparent with some web clients than with others — e.g. accessing the site with Firefox almost always succeeded (needless to say, when the site “obviously works just fine” as far as the troubleshooter is concerned, that seriously muddles the issue).

From what we can surmise, if the client didn't immediately send the payload after completing the TLS handshake, that tcp session had a high chance of eventually timing out. I'm not 100% sure where tcp_sack fits into the exchange between the AWS NLB+TLS and the EC2 instance, since we have no visibility into any tcp traffic after it leaves the box, but obviously it makes use of selective acks, or it wouldn't have had such a dramatic effect when we turned those off. We equally have no explanation why check latency dropped by 100ms during the same time period.

Unfortunately, we couldn't afford to keep the affected systems in their broken state for in-depth troubleshooting, but hopefully this experience is useful to others facing similar issues.

 
Read more...

from Christian Brauner

Introduction (CVE-2019-5736)

Today, Monday, 2019-02-11, 14:00:00 CET CVE-2019-5736 was released:

The vulnerability allows a malicious container to (with minimal user interaction) overwrite the host runc binary and thus gain root-level code execution on the host. The level of user interaction is being able to run any command (it doesn't matter if the command is not attacker-controlled) as root within a container in either of these contexts:

  • Creating a new container using an attacker-controlled image.
  • Attaching (docker exec) into an existing container which the attacker had previous write access to.

I've been working on a fix for this issue over the last couple of weeks together with Aleksa a friend of mine and maintainer of runC. When he notified me about the issue in runC we tried to come up with an exploit for LXC as well and though harder it is doable. I was interested in the issue for technical reasons and figuring out how to reliably fix it was quite fun (with a proper dose of pure hatred). It also caused me to finally write down some personal thoughts I had for a long time about how we are running containers.

What are Privileged Containers?

At a first glance this is a question that is probably trivial to anyone who has a decent low-level understanding of containers. Maybe even most users by now will know what a privileged container is. A first pass at defining it would be to say that a privileged container is a container that is owned by root. Looking closer this seems an insufficient definition. What about containers using user namespaces that are started as root? It seems we need to distinguish between what ids a container is running with. So we could say a privileged container is a container that is running as root. However, this is still wrong. Because “running as root” can either be seen as meaning “running as root as seen from the outside” or “running as root from the inside” where “outside” means “as seen from a task outside the container” and “inside” means “as seen from a task inside the container”.

What we really mean by a privileged container is a container where the semantics for id 0 are the same inside and outside of the container ceteris paribus. I say “ceteris paribus” because using LSMs, seccomp or any other security mechanism will not cause a change in the meaning of id 0 inside and outside the container. For example, a breakout caused by a bug in the runtime implementation will give you root access on the host.

An unprivileged container then simply is any container in which the semantics for id 0 inside the container are different from id 0 outside the container. For example, a breakout caused by a bug in the runtime implementation will not give you root access on the host by default. This should only be possible if the kernel's user namespace implementation has a bug.

The reason why I like to define privileged containers this way is that it also lets us handle edge cases. Specifically, the case where a container is using a user namespace but a hole is punched into the idmapping at id 0 aka where id 0 is mapped through. Consider a container that uses the following idmappings:

id: 0 100000 100000

This instructs the kernel to setup the following mapping:

id: container_id(0) -> host_id(100000)
id: container_id(1) -> host_id(100001)
id: container_id(2) -> host_id(100002)
.
.
.

container_id(100000) -> host_id(200000)

With this mapping it's evident that container_id(0) != host_id(0). But now consider the following mapping:

id: 0 0 1
id: 1 100001 99999

This instructs the kernel to setup the following mapping:

id: container_id(0) -> host_id(0)
id: container_id(1) -> host_id(100001)
id: container_id(2) -> host_id(100002)
.
.
.

container_id(99999) -> host_id(199999)

In contrast to the first example this has the consequence that container_id(0) == host_id(0). I would argue that any container that at least punches a hole for id 0 into its idmapping up to specifying an identity mapping is to be considered a privileged container.

As a sidenote, Docker containers run as privileged containers by default. There is usually some confusion where people think because they do not use the --privileged flag that Docker containers run unprivileged. This is wrong. What the --privileged flag does is to give you even more permissions by e.g. not dropping (specific or even any) capabilities. One could say that such containers are almost “super-privileged”.

The Trouble with Privileged Containers

The problem I see with privileged containers is essentially captured by LXC's and LXD's upstream security position which we have held since at least 2015 but probably even earlier. I'm quoting from our notes about privileged containers:

Privileged containers are defined as any container where the container uid 0 is mapped to the host's uid 0. In such containers, protection of the host and prevention of escape is entirely done through Mandatory Access Control (apparmor, selinux), seccomp filters, dropping of capabilities and namespaces.

Those technologies combined will typically prevent any accidental damage of the host, where damage is defined as things like reconfiguring host hardware, reconfiguring the host kernel or accessing the host filesystem.

LXC upstream's position is that those containers aren't and cannot be root-safe.

They are still valuable in an environment where you are running trusted workloads or where no untrusted task is running as root in the container.

We are aware of a number of exploits which will let you escape such containers and get full root privileges on the host. Some of those exploits can be trivially blocked and so we do update our different policies once made aware of them. Some others aren't blockable as they would require blocking so many core features that the average container would become completely unusable.

[...]

As privileged containers are considered unsafe, we typically will not consider new container escape exploits to be security issues worthy of a CVE and quick fix. We will however try to mitigate those issues so that accidental damage to the host is prevented.

LXC's upstream position for a long time has been that privileged containers are not and cannot be root safe. For something to be considered root safe it should be safe to hand root access to third parties or tasks.

Running Untrusted Workloads in Privileged Containers

is insane. That's about everything that this paragraph should contain. The fact that the semantics for id 0 inside and outside the container are identical entails that any meaningful container escape will have the attacker gain root on the host.

CVE-2019-5736 Is a Very Very Very Bad Privilege Escalation to Host Root

CVE-2019-5736 is an excellent illustration of such an attack. Think about it: a process running inside a privileged container can rather trivially corrupt the binary that is used to attach to the container. This allows an attacker to create a custom ELF binary on the host. That binary could do anything it wants:

  • could just be a binary that calls poweroff
  • could be a binary that spawns a root shell
  • could be a binary that kills other containers when called again to attach
  • could be suid cat
  • .
  • .
  • .

The attack vector is actually slightly worse for runC due to its architecture. Since runC exits after spawning the container it can also be attacked through a malicious container image. Which is super bad given that a lot of container workload workflows rely on downloading images from the web.

LXC cannot be attacked through a malicious image since the monitor process (a singleton per-container) never exits during the containers life cycle. Since the kernel does not allow modifications to running binaries it is not possible for the attacker to corrupt it. When the container is shutdown or killed the attacking task will be killed before it can do any harm. Only when the last process running inside the container has exited will the monitor itself exit. This has the consequence, that if you run privileged OCI containers via our oci template with LXC your are not vulnerable to malicious images. Only the vector through the attaching binary still applies.

The Lie that Privileged Containers can be safe

Aside from mostly working on the Kernel I'm also a maintainer of LXC and LXD alongside Stéphane Graber. We are responsible for LXC – the low-level container runtime – and LXD – the container management daemon using LXC. We have made a very conscious decision to consider privileged containers not root safe. Two main corollaries follow from this:

  1. Privileged containers should never be used to run untrusted workloads.
  2. Breakouts from privileged containers are not considered CVEs by our security policy. It still seems a common belief that if we all just try hard enough using privileged containers for untrusted workloads is safe. This is not a promise that can be made good upon. A privileged container is not a security boundary. The reason for this is simply what we looked at above: container_id(0) == host_id(0). It is therefore deeply troubling that this industry is happy to let users believe that they are safe and secure using privileged containers.

Unprivileged Containers as Default

As upstream for LXC and LXD we have been advocating the use of unprivileged containers by default for years. Way ahead before anyone else did. Our low-level library LXC has supported unprivileged containers since 2013 when user namespaces were merged into the kernel. With LXD we have taken it one step further and made unprivileged containers the default and privileged containers opt-in for that very matter: privileged containers aren't safe. We even allow you to have per-container idmappings to make sure that not just each container is isolated from the host but also all containers from each other.

For years we have been advocating for unprivileged containers on conferences, in blogposts, and whenever we have spoken to people but somehow this whole industry has chosen to rely on privileged containers.

The good news is that we are seeing changes as people become more familiar with the perils of privileged containers. Let this recent CVE be another reminder that unprivileged containers need to be the default.

Are LXC and LXD affected?

I have seen this question asked all over the place so I guess I should add a section about this too:

  • Unprivileged LXC and LXD containers are not affected.

  • Any privileged LXC and LXD container running on a read-only rootfs is not affected.

  • Privileged LXC containers in the definition provided above are affected. Though the attack is more difficult than for runC. The reason for this is that the lxc-attach binary does not exit before the program in the container has finished executing. This means an attacker would need to open an O_PATH file descriptor to /proc/self/exe, fork() itself into the background and re-open the O_PATH file descriptor through /proc/self/fd/<O_PATH-nr> in a loop as O_WRONLY and keep trying to write to the binary until such time as lxc-attach exits. Before that it will not succeed since the kernel will not allow modification of a running binary.

  • Privileged LXD containers are only affected if the daemon is restarted other than for upgrade reasons. This should basically never happen. The LXD daemon never exits so any write will fail because the kernel does not allow modification of a running binary. If the LXD daemon is restarted because of an upgrade the binary will be swapped out and the file descriptor used for the attack will write to the old in-memory binary and not to the new binary.

Chromebooks with Crostini using LXD are not affected

Chromebooks use LXD as their default container runtime are not affected. First of all, all binaries reside on a read-only filesystem and second, LXD does not allow running privileged containers on Chromebooks through the LXD_UNPRIVILEGED_ONLY flag. For more details see this link.

Fixing CVE-2019-5736

To prevent this attack, LXC has been patched to create a temporary copy of the calling binary itself when it attaches to containers (cf.6400238d08cdf1ca20d49bafb85f4e224348bf9d). To do this LXC can be instructed to create an anonymous, in-memory file using the memfd_create() system call and to copy itself into the temporary in-memory file, which is then sealed to prevent further modifications. LXC then executes this sealed, in-memory file instead of the original on-disk binary. Any compromising write operations from a privileged container to the host LXC binary will then write to the temporary in-memory binary and not to the host binary on-disk, preserving the integrity of the host LXC binary. Also as the temporary, in-memory LXC binary is sealed, writes to this will also fail. To not break downstream users of the shared library this is opt-in by setting LXC_MEMFD_REXEC in the environment. For our lxc-attach binary which is the only attack vector this is now done by default.

Workloads that place the LXC binaries on a read-only filesystem or prevent running privileged containers can disable this feature by passing --disable-memfd-rexec during the configure stage when compiling LXC.

 
Read more...

from mcgrof

The offlineimap woes

A long term goal I've had for a while now was finding a reasonable replacement for offlineimap to get all my email for my development purposes. I knew offlineimap kept dying on me with out of memory (OOM) errors however it was not clear how bad the issue was. It was also not clear what I'd replace it with until now. At least for now... I've replaced offlineimap with mbsync. Below are some details comparing both, with shiny graphs of system utilization on both, I'll provide my recipes for fetching gmail nested labels over IMAP, glance over my systemd user unit files and explain why I use them, and hint what I'm asking Santa for in the future.

System setup when home is $HOME

I used to host my mail fetching system at home, however, $HOME can get complicated if you travel often, and so for more flexibility I rely now on a digital ocean droplet with a small dedicated volume pool for storage for mail. This lets me do away with the stupid host whenever I'm tired of it, and lets me collect nice system utilization graphs without much effort.

Graphing use on offlineimap

Every now and then I'd check my logs and see how offlineimap tends to run out of memory, and would tend to barf. A temporary solution I figure would work was to disable autorefresh, and instead run offlineimap once in a controlled timely loop using systemd unit timers. That solution didn't help in the end. I finally had a bit of time to check my logs carefully and also check system utilization graphs on the sytem over time and to my surprise offlineimap was running out of memory every single damn time. Here's what I saw from results of running offlineimap for a full month:

Full month graph of offlineimap

Those spikes are a bit concerning, it's likely the system running out of memory. But let's zoom in to see how often with an hourly graph:

Hourly graph of offlineimap

Pretty much, I was OOM'ing every single damn time! The lull you see towards the end was me just getting fed up and killing offlineimap until I found a replacement.

The OOM risks

Running out of memory every now and then is one thing, but every single time is just insanity. A system always running low on memory while doing writes is an effective way to stress test a kernel, and if the stars align against you, you might even end up with a corrupted filesystem. Fortunately this puny single threaded application is simple enough so I didn't run into that issue. But it was a risk.

mbsync

mbsync is written in C, actively maintained and has mutt code pedigree. Need I say more? Hell, I'm only sad it took me so long to find out about it. mbsync works with idea of channels, for each it would have a master and local store. The master is where we fetch data from, and the local where we stash things locally.

But in reading its documentation it was not exactly clear how I'd use it for my development purpose to fetch email off of my gmail where I used nested labels for different public mailing lists.

The documentation was also not clear on what to do when migrating and keeping old files.

mbsync migration

Yes in theory you could keep the old IMAP folder, but in practice I ran into a lot of issues. So much so, my solution to the problem was:

$ rm -rf Mail/

And just start fresh... Afraid to make the jump due to the amount of time it may take to sync one of your precious labels? Well, evaluate my different timer solution below.

mbsync for nested gmail labels

Here's what I ended up with. It demos getting mail to say my linux-kernel/linux-xfs and linux-kernel/linux-fsdevel mailing lists, and includes some empirical throttling to ensure you don't get punted by gmail for going over some sort of usage quota they've concocted for an IMAP connection.

# A gmail example
#
# First generic defaults
Create Slave
SyncState *

IMAPAccount gmail
CertificateFile /etc/ssl/certs/ca-certificates.crt
SSLType IMAPS
Host imap.gmail.com
User user@gmail.com
# Must be an application specific password, otherwise google will deny access.
Pass example
# Throttle mbsync so we don't go over gmail's quota: OVERQUOTA error would
# eventually be returned otherwise. For more details see:
# https://sourceforge.net/p/isync/mailman/message/35458365/
PipelineDepth 50

MaildirStore gmail-local
# The trailing "/" is important
Path ~/Mail/
Inbox ~/Mail/Inbox
Subfolders Verbatim

IMAPStore gmail-remote
Account gmail

# emails sent directly to my kernel.org address
# are stored in my gmail label "korg"
Channel korg
Master :gmail-remote:"korg"
Slave :gmail-local:korg

# An example of nested labels on gmail, useful for large projects with
# many mailing lists. We have to flatten out the structure locally.
Channel linux-xfs
Master :gmail-remote:"linux-kernel/linux-xfs"
Slave :gmail-local:linux-kernel.linux-xfs

Channel linux-fsdevel
Master :gmail-remote:"linux-kernel/linux-fsdevel"
Slave :gmail-local:linux-kernel.linux-fsdevel

# Get all the gmail channels together into a group.
Group googlemail
Channel korg
Channel linux-xfs
Channel linux-fsdevel

mbsync systemd unit files

Now, some of these mailing lists (channels in mbsync lingo) have heavy traffic, and I don't need to be fetching email off of them that often. I also have a channel dedicated solely for emails sent directly to me, those I want right away. But also... since I'm starting fresh, if I ran mbsync to fetch all my email it would mean that at one point mbsync would stall for any large label I'd have. I'd have to wait for those big labels before getting new email for smaller labels. For this reason, ideally I 'd want to actually call mbsync at different intervals depending on the mailing list / mbsync channel. Fortunately mbsync locks per target local directory, and so the only missing piece was a way to configure timers / calls for mbsync in such a way I could still journal calls / issues.

I ended up writing a systemd timer and a service unit file per mailing list. The nice thing about this, in favor over using good 'ol cron, is OnUnitInactiveSec=4m, for instance will call mbsync 4 minutes after it last finished. I also end up with a central place to collect logs:

journalctl --user

Or if I want to monitor:

journalctl --user -f

For my korg label, patches / rants sent directly to me, I want to fetch mail every minute:

$ cat .config/systemd/user/mbsync-korg.timer
[Unit]
Description=mbsync query timer [0000-korg]
ConditionPathExists=%h/.mbsyncrc

[Timer]
OnBootSec=1m
OnUnitInactiveSec=1m

[Install]
WantedBy=default.target

$ cat .config/systemd/user/mbsync-korg.service
[Unit]
Description=mbsync service [korg]
Documentation=man:mbsync(1)
ConditionPathExists=%h/.mbsyncrc

[Service]
Type=oneshot
ExecStart=/usr/local/bin/mbsync 0000-korg

[Install]
WantedBy=mail.target

However for my linux-fsdevel... I could wait at least 30 minutes for a refresh:

$ cat .config/systemd/user/mbsync-linux-fsdevel.timer
[Unit]
Description=mbsync query timer [linux-fsdevel]
ConditionPathExists=%h/.mbsyncrc

[Timer]
OnBootSec=5m
OnUnitInactiveSec=30m

[Install]
WantedBy=default.target

And the service unit:

$ cat .config/systemd/user/mbsync-linux-fsdevel.service
[Unit]
Description=mbsync service [linux-fsdevel]
Documentation=man:mbsync(1)
ConditionPathExists=%h/.mbsyncrc

[Service]
Type=oneshot
ExecStart=/usr/local/bin/mbsync linux-fsdevel

[Install]
WantedBy=mail.target

Enabling and starting systemd user unit files

To enable these unit files I just run for each, for instance for linux-fsdevel:

systemctl --user enable mbsync-linux-fsdevel.timer
systemctl --user start  mbsync-linux-fsdevel.timer

Graphing mbsync

So... how did it do?

I currently have enabled 5 mbsync channels, all fetching my email in the background for me. And not a single one goes on puking with OOM. Here's what life is looking like now:

mbsync hourly

Peachy.

Long term ideals

IMAP does the job for email, it just seems utterly stupid for public mailing lists and I figure we can do much better. This is specially true in light of the fact of how much simpler it is for me to follow public code Vs public email threads these days. Keep in mind how much more complicated code management is over the goal of just wanting to get a simple stupid email Message ID onto my local Maildir directory. I really had my hopes on public-inbox but after looking into it, it seems clear now that its main objectives are for archiving — not local storage / MUA use. For details refer to this linux-kernel discussion on public-inbox with a MUA focus.

If the issue with using public-inbox for local MUA usage was that archive was too big... it seems sensible to me to evaluate trying an even smaller epoch size, and default clients to fetch only one epoch, the latest one. That alone wouldn't solve the issue though. How data files are stored on Maildir makes using git almost incompatible. A proper evaluation of using mbox would be in order.

The social lubricant is out on the idea though, and I'm in hopes a proper simple git Mail solution is bound to find us soon for public emails.

 
Read more...

from Christian Brauner

asciicast

Introduction

Android Binder is an inter-process communication (IPC) mechanism. It is heavily used in all Android devices. The binder kernel driver has been present in the upstream Linux kernel for quite a while now.

Binder has been a controversial patchset (see this lwn article as an example). Its design was considered wrong and to violate certain core kernel design principles (e.g. a task should never touch another tasks file descriptor table). Most kernel developers were not a fan of binder.

Recently, the upstream binder code has fortunately been reworked significantly (e.g. it does not touch another tasks file descriptor table anymore, the locking is very fine-grained now, etc.).

With Android being one of the major operating systems (OS) for a vast number of devices there is simply no way around binder.

The Android Service Manager

The binder IPC mechanism is accessible from userspace through device nodes located at /dev. A modern Android system will allocate three device nodes:

  • /dev/binder
  • /dev/hwbinder
  • /dev/vndbinder

serving different purposes. However, the logic is the same for all three of them. A process can call open(2) on those device nodes to receive an fd which it can then use to issue requests via ioctl(2)s. Android has a service manager which is used to translate addresses to bus names and only the address of the service manager itself is well-known. The service manager is registered through an ioctl(2) and there can only be a single service manager. This means once a service manager has grabbed hold of binder devices they cannot be (easily) reused by a second service manager.

Running Android in Containers

This matters as soon as multiple instances of Android are supposed to be run. Since they will all need their own private binder devices. This is a use-case that arises pretty naturally when running Android in system containers. People have been doing this for a long time with LXC. A project that has set out to make running Android in LXC containers very easy is Anbox. Anbox makes it possible to run hundreds of Android containers.

To properly run Android in a container it is necessary that each container has a set of private binder devices.

Statically Allocating binder Devices

Binder devices are currently statically allocated at compile time. Before compiling a kernel the CONFIG_ANDROID_BINDER_DEVICES option needs to bet set in the kernel config (Kconfig) containing the names of the binder devices to allocate at boot. By default it is set as:

CONFIG_ANDROID_BINDER_DEVICES="binder,hwbinder,vndbinder"

To allocate additional binder devices the user needs to specify them with this Kconfig option. This is problematic since users need to know how many containers they will run at maximum and then to calculate the number of devices they need so they can specify them in the Kconfig. When the maximum number of needed binder devices changes after kernel compilation the only way to get additional devices is to recompile the kernel.

Problem 1: Using the misc major Device Number

This situation is aggravated by the fact that binder devices use the misc major number in the kernel. Each device node in the Linux kernel is identified by a major and minor number. A device can request its own major number. If it does it will have an exclusive range of minor numbers it doesn't share with anything else and is free to hand out. Or it can use the misc major number. The misc major number is shared amongst different devices. However, that also means the number of minor devices that can be handed out is limited by all users of misc major. So if a user requests a very large number of binder devices in their Kconfig they might make it impossible for anyone else to allocate minor numbers. Or there simply might not be enough to allocate for itself.

Problem 2: Containers and IPC namespaces

All of those binder devices requested in the Kconfig via CONFIG_ANDROID_BINDER_DEVICES will be allocated at boot and be placed in the hosts devtmpfs mount usually located at /dev or – depending on the udev(7) implementation – will be created via mknod(2) – by udev(7) at boot. That means all of those devices initially belong to the host IPC namespace. However, containers usually run in their own IPC namespace separate from the host's. But when binder devices located in /dev are handed to containers (e.g. with a bind-mount) the kernel driver will not know that these devices are now used in a different IPC namespace since the driver is not IPC namespace aware. This is not a serious technical issue but a serious conceptual one. There should be a way to have per-IPC namespace binder devices.

Enter binderfs

To solve both problems we came up with a solution that I presented at the Linux Plumbers Conference in Vancouver this year. There's a video of that presentation available on Youtube:

Android binderfs is a tiny filesystem that allows users to dynamically allocate binder devices, i.e. it allows to add and remove binder devices at runtime. Which means it solves problem 1. Additionally, binder devices located in a new binderfs instance are independent of binder devices located in another binderfs instance. All binder devices in binderfs instances are also independent of the binder devices allocated during boot specified in CONFIG_ANDROID_BINDER_DEVICES. This means, binderfs solves problem 2.

Android binderfs can be mounted via:

mount -t binder binder /dev/binderfs

at which point a new instance of binderfs will show up at /dev/binderfs. In a fresh instance of binderfs no binder devices will be present. There will only be a binder-control device which serves as the request handler for binderfs:

root@edfu:~# ls -al /dev/binderfs/
total 0
drwxr-xr-x  2 root root      0 Jan 10 15:07 .
drwxr-xr-x 20 root root   4260 Jan 10 15:07 ..
crw-------  1 root root 242, 6 Jan 10 15:07 binder-control

binderfs: Dynamically Allocating a New binder Device

To allocate a new binder device in a binderfs instance a request needs to be sent through the binder-control device node. A request is sent in the form of an ioctl(2). Here's an example program:

#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <linux/android/binder.h>
#include <linux/android/binderfs.h>

int main(int argc, char *argv[])
{
        int fd, ret, saved_errno;
        size_t len;
        struct binderfs_device device = { 0 };

        if (argc != 3)
                exit(EXIT_FAILURE);

        len = strlen(argv[2]);
        if (len > BINDERFS_MAX_NAME)
                exit(EXIT_FAILURE);

        memcpy(device.name, argv[2], len);

        fd = open(argv[1], O_RDONLY | O_CLOEXEC);
        if (fd < 0) {
                printf("%s - Failed to open binder-control device\n",
                       strerror(errno));
                exit(EXIT_FAILURE);
        }

        ret = ioctl(fd, BINDER_CTL_ADD, &device);
        saved_errno = errno;
        close(fd);
        errno = saved_errno;
        if (ret < 0) {
                printf("%s - Failed to allocate new binder device\n",
                       strerror(errno));
                exit(EXIT_FAILURE);
        }

        printf("Allocated new binder device with major %d, minor %d, "
               "and name %s\n", device.major, device.minor,
               device.name);

        exit(EXIT_SUCCESS);
}

What this program simply does is to open the binder-control device node and sending a BINDER_CTL_ADD request to the kernel. Users of binderfs need to tell the kernel which name the new binder device should get. By default a name can only contain up to 256 chars including the terminating zero byte. The struct which is used is:

/**
 * struct binderfs_device - retrieve information about a new binder device
 * @name:   the name to use for the new binderfs binder device
 * @major:  major number allocated for binderfs binder devices
 * @minor:  minor number allocated for the new binderfs binder device
 *
 */
struct binderfs_device {
       char name[BINDERFS_MAX_NAME + 1];
       __u32 major;
       __u32 minor;
};

and is defined in linux/android/binderfs.h. Once the request is made via an ioctl(2) passing a struct binder_device with the name to the kernel it will allocate a new binder device and return the major and minor number of the new device in the struct (This is necessary because binderfs allocated a major device number dynamically at boot.). After the ioctl(2) returns there will be a new binder device located under /dev/binderfs with the chosen name:

root@edfu:~# ls -al /dev/binderfs/
total 0
drwxr-xr-x  2 root root      0 Jan 10 15:19 .
drwxr-xr-x 20 root root   4260 Jan 10 15:07 ..
crw-------  1 root root 242, 0 Jan 10 15:19 binder-control
crw-------  1 root root 242, 1 Jan 10 15:19 my-binder
crw-------  1 root root 242, 2 Jan 10 15:19 my-binder1

binderfs: Deleting a binder Device

Deleting binder devices does not involve issuing another ioctl(2) request through binder-control. They can be deleted via unlink(2). This means that the rm(1) tool can be used to delete them:

root@edfu:~# rm /dev/binderfs/my-binder1
root@edfu:~# ls -al /dev/binderfs/
total 0
drwxr-xr-x  2 root root      0 Jan 10 15:19 .
drwxr-xr-x 20 root root   4260 Jan 10 15:07 ..
crw-------  1 root root 242, 0 Jan 10 15:19 binder-control
crw-------  1 root root 242, 1 Jan 10 15:19 my-binder

Note that the binder-control device cannot be deleted since this would make the binderfs instance unuseable. The binder-control device will be deleted when the binderfs instance is unmounted and all references to it have been dropped.

binderfs: Mounting Multiple Instances

Mounting another binderfs instance at a different location will create a new and separate instance from all other binderfs mounts. This is identical to the behavior of devpts, tmpfs, and also – even though never merged in the kernel – kdbusfs:

root@edfu:~# mkdir binderfs1
root@edfu:~# mount -t binder binder binderfs1
root@edfu:~# ls -al binderfs1/
total 4
drwxr-xr-x  2 root   root        0 Jan 10 15:23 .
drwxr-xr-x 72 ubuntu ubuntu   4096 Jan 10 15:23 ..
crw-------  1 root   root   242, 2 Jan 10 15:23 binder-control

There is no my-binder device in this new binderfs instance since its devices are not related to those in the binderfs instance at /dev/binderfs. This means users can easily get their private set of binder devices.

binderfs: Mounting binderfs in User Namespaces

The Android binderfs filesystem can be mounted and used to allocate new binder devices in user namespaces. This has the advantage that binderfs can be used in unprivileged containers or any user-namespace-based sandboxing solution:

ubuntu@edfu:~$ unshare --user --map-root --mount
root@edfu:~# mkdir binderfs-userns
root@edfu:~# mount -t binder binder binderfs-userns/
root@edfu:~# The "bfs" binary used here is the compiled program from above
root@edfu:~# ./bfs binderfs-userns/binder-control my-user-binder
Allocated new binder device with major 242, minor 4, and name my-user-binder
root@edfu:~# ls -al binderfs-userns/
total 4
drwxr-xr-x  2 root root      0 Jan 10 15:34 .
drwxr-xr-x 73 root root   4096 Jan 10 15:32 ..
crw-------  1 root root 242, 3 Jan 10 15:34 binder-control
crw-------  1 root root 242, 4 Jan 10 15:36 my-user-binder

Kernel Patchsets

The binderfs patchset is merged upstream and will be available when Linux 5.0 gets released. There are a few outstanding patches that are currently waiting in Greg's tree (cf. binderfs: remove wrong kern_mount() call and binderfs: make each binderfs mount a new instancechar-misc-linus) and some others are queued for the 5.1 merge window. But overall it seems to be in decent shape.

 
Read more...

from Greg Kroah-Hartman

As everyone seems to like to put kernel trees up on github for random projects (based on the crazy notifications I get all the time), I figured it was time to put up a “semi-official” mirror of all of the stable kernel releases on github.com

It can be found at: https://github.com/gregkh/linux

It differs from Linus's tree at: https://github.com/torvalds/linux in that it contains all of the different stable tree branches and stable releases and tags, which many devices end up building on top of.

So, mirror away!

Also note, this is a read-only mirror, any pull requests created on it will be gleefully ignored, just like happens on Linus's github mirror.

 
Read more...

from Benson Leung

tl;dr: There are 6, it's unfortunately very confusing to the end user.

Classic USB from the 1.1, 2.0, to 3.0 generations using USB-A and USB-B connectors have a really nice property in that cables were directional and plugs and receptacles were physically distinct to specify a different capability. A USB 3.0 capable USB-B plug was physically larger than a 2.0 plug and would not fit into a USB 2.0-only receptacle. For the end user, this meant that as long as they have a cable that would physically connect to both the host and the device, the system would function properly, as there is only ever one kind of cable that goes from one A plug to a particular flavor of B plug.

Does the same hold for USB-C™?

Sadly, the answer is no. Cables with a USB-C plug on both ends (C-to-C), hitherto referred to as “USB-C cables”, come in several varieties. Here they are, current as of the USB Type-C™ Specification 1.4 on June 2019:

  1. USB 2.0 rated at 3A
  2. USB 2.0 rated at 5A
  3. USB 3.2 Gen 1 (5gbps) rated at 3A
  4. USB 3.2 Gen 1 (5gbps) rated at 5A
  5. USB 3.2 Gen 2 (10gbps) rated at 3A
  6. USB 3.2 Gen 2 (10gpbs) rated at 5A

We have a matrix of 2 x 3, with 2 current rating levels (3A max current, or 5A max current), and 3 data speeds (480mbps, 5gbps, 10gpbs).

Adding a bit more detail, cables 3-6, in fact, have 10 more wires that connect end-to-end compared to the USB 2.0 ones in order to handle SuperSpeed data rates. Cables 3-6 are called “Full-Featured Type-C Cables” in the spec, and the extra wires are actually required for more than just faster data speeds.

“Full-Featured Type-C Cables” are required for the most common USB-C Alternate Mode used on PCs and many phones today, VESA DisplayPort Alternate Mode. VESA DP Alt mode requires most of the 10 extra wires present in a Full-Featured USB-C cable.

My new Pixelbook, for example, does not have a dedicated physical DP or HDMI port and relies on VESA DP Alt Mode in order to connect to any monitor. Brand new monitors and docking stations may have a USB-C receptacle in order to allow for a DisplayPort, power and USB connection to the laptop.

Suddenly, with a USB-C receptacle on both the host and the device (the monitor), and a range of 6 possible USB-C cables, the user may encounter a pitfall: They may try to use the USB 2.0 cable that came with their laptop with the display and the display doesn't work, despite the plugs fitting on both sides because 10 wires aren't there.

Why did it come to this? This problem was created because the USB-C connectors were designed to replace all of the previous USB connectors at the same time as vastly increasing what the cable could do in power, data, and display dimensions. The new connector may be and virtually impossible to plug in improperly (no USB superposition problem, no grabbing the wrong end of the cable), but sacrificed for that simplicity is the ability to intuitively know whether the system you've connected together has all of the functionality possible. The USB spec also cannot simply mandate that all USB-C cables have the maximum number of wires all the time because that would vastly increase BOM cost for cases where the cable is just used for charging primarily.

How can we fix this? Unfortunately, it's a tough problem that has to involve user education. USB-C cables are mandated by USB-IF to bear a particular logo in order to be certified:

Image

Collectively, we have to teach users that if they need DisplayPort to work, they need to find cables with the two logos on the right.

Technically, there is something that software can do to help the education problem. Cables 2-6 are required by the USB specification to include an electronic marker chip which contains vital information about the cable. The host should be able to read that eMarker, and identify what its data and power capabilities are. If the host sees that the user is attempting to use DisplayPort Alternate Mode with the wrong cable, rather than a silent failure (ie, the external display doesn't light up), the OS should tell the user via a notification they may be using the wrong cable, and educate the user about cables with the right logo.

This is something that my team is actively working on, and I hope to be able to show the kernel pieces necessary soon.

 
Read more...

from Konstantin Ryabitsev

Ever since the demise of Google+, many developers have expressed a desire to have a service that would provide a way to create and manage content in a format that would be more rich and easier to access than email messages sent to LKML.

Today, we would like to introduce people.kernel.org, which is an ActivityPub-enabled federated platform powered by WriteFreely and hosted by very nice and accommodating folks at write.as.

Why WriteFreely?

There were many candidates, but we chose WriteFreely for the following reasons:

  • it runs on Linux
  • it is free software
  • it is federated using ActivityPub
  • it supports writing rich content using markdown
  • it offers command-line publishing tools

How is it different from kernelplanet.org?

The Kernel Planet is an aggregator of people's individual blogs. The main distinction of the Planet site from people.kernel.org is that the authors here write on topics having to do with Linux and technology in general, while there are no such restrictions in place on the Kernel Planet. You are certainly welcome to follow both!

Who can join people.kernel.org

At this time, we are aiming to roll out this service to a subset of high-profile developers, and the easiest way to do so is to offer it initially to those folks who are listed in the MAINTAINERS file.

That said, if you are not currently in the MAINTAINERS file, but think you can be a great writer for people.kernel.org, then you can simply ask someone who is in that file to “sponsor” you. Simply add them to a cc on your invite request.

How to join people.kernel.org

See the about page for full details.

 
Read more...

from Mauro Carvalho Chehab

Having a certain number of machines here with Fedora, I started working on April, 30 with the migration of those to use Fedora’s latest version: Fedora 30.

Note: this is a re-post of a blog entry I wrote back on May, 1st: https://linuxkernel.home.blog/2019/05/01/fedora-30-installation/ with one update at the end made on Jun, 26.

First machine: a multi-monitor desktop

I started the migration on a machine with multiple monitors connected on it. Originally, when Fedora was installed on it, the GPU Kernel driver for the chipset (called DRM KMS – Kernel ModeSet) was not available yet at Fedora’s Kernel. So, Fedora installer (Anaconda) added a nomodeset option to the Kernel parameters.

As there was KMS support was just arriving upstream, I built my own Kernel on that time and removed the nomodeset option.

By the time I did the upgrade, maybe except for the rescue mode, all Kernels were using KMS.

I did the upgrade the same way I did in the past (as described here), e. g. by calling:

dnf system-upgrade --release 30 --allowerasing download
dnf system-upgrade reboot

The system-upgrade had to remove pgp-tools, with currently has a broken dependency, and eclipse. The last one was due to the fact that, on Fedora 29, I was with modular support enabled, with made it depend on a Java modular set of packages.

After booting the Kernel, I had the first problem with the upgrade: Fedora now uses BootLoaderSpec – BLS by default, converting the old grub.cfg file to the new BLS mode. Well, the conversion simply re-added the nomodeset option to all Kernels, causing it to disable the extra monitors, as X11/Wayland would need to setup the video mode via the old way. On that time, I wasn’t aware of BLS, so I just ran this command:

cd /boot/efi/EFI/fedora/ && cp grub.cfg.rpmsave grub.cfg

In order to restore the working grub.cfg file.

Later, in order to avoid further problems on Kernel upgrades, I installed grubby-deprecated, as recommended at https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault#Upgrade.2Fcompatibility_impact, and manually edited /etc/default/grub in order to comment out the line with GRUB_ENABLE_BLSCFG. I probably could just fix the BLS setup instead, but I opted to be conservative here.

After that, I worked to re-install eclipse. For that, I had to disable modular support, as eclipse depends on an ant package version that was not there yet inside Fedora modular repositories by the time I did the upgrade.

In summary, my first install didn’t went smoothly.

Second machine: a laptop

At the second machine, I ran the same dnf system-upgrade commands as did at the first machine. As this laptop had a Fedora 29 installed last month from scratch, I was expecting a better luck.

Guess what…

… it ended to be an even worse upgrade… machine crashed after boot!

Basically, systemd doesn’t want to mount a rootfs image if it doesn’t contain a valid file at /usr/lib/os-release. On Fedora 29, this is a soft link to another file inside /usr/lib/os.release.d. The specific file name depends if you installed Fedora Workstation, Fedora Server, …

During the upgrade, the directory /usr/lib/os.release.d got removed, causing the soft link to point to nowhere. Due to that, after boot, systemd crashes the machine with a “brilliant” message, saying that it was generating a rdsosreport.txt, crowded of information that one would need to copy to some place else in order to analyze. Well, as it didn’t mount the rootfs, copying it would be tricky, without network nor the usual commands found at /bin and /sbin directories.

So, instead, I just looked at the journal file, where it said that the failure was at /lib/systemd/system/initrd-switch-root.service. That basically calls systemctl, asking it to switch the rootfs to /sysroot (with is the root filesystem as listed at /etc/fstab). Well, systemctl checks if it recognizes os-release. If not, instead of mounting it, producing a warning and hoping for the best, it simply crashes the system!

In order to fix it, I had to use vi to manually create a Fedora 30 release. Thankfully, I had already a valid os-release from my first upgraded machine. So, I just manually typed it.

After that, the system booted smoothly.

Other machines

Knowing that Fedora 30 install was not trivial, I decided to go one step back, learning from my past mistakes.

So, I decided to write a small “script” with the steps to be done for the upgrade. Instead of running it as a script, you may instead run it line by line (after the set -e line). Here it is:

#/bin/bash

#should run as root

# If one runs it as a script, makes it abort on errors
set -e

dnf config-manager --set-disabled fedora-modular
dnf config-manager --set-disabled updates-modular
dnf config-manager --set-disabled updates-testing-modular
dnf distro-sync
dnf upgrade --refresh
(cd /usr/lib/ && cp $(readlink -f os-release) /tmp/os-release && rm os-release && cp /tmp/os-release os-release)
dnf system-upgrade --release 30 --allowerasing download
dnf system-upgrade reboot

Please notice that the scripts will removes os-release and copies the one from the linked file. Please check if it went well, as if the logic fails, you may end crashing your machine at the next boot.

Also, please notice that it will disable Fedora modular support. Well, I don’t need anything there, so it works pretty fine for me.

Post-install steps

Please notice that, after an upgrade, Fedora may re-enable Fedora modular. That happened to me on one machine with had Fedora 26. If you don't want to keep it enabled, you should do:

dnf config-manager --set-disabled fedora-modular
dnf config-manager --set-disabled updates-modular
dnf config-manager --set-disabled updates-testing-modular
dnf distro-sync

Results

I repeated the same procedure on several other machines, one being a Fedora Server, using the above scripts. On all, it went smoothly.

 
Read more...