people.kernel.org

Reader

Read the latest posts from people.kernel.org.

from Benson Leung

tl;dr: There are now 8. Thunderbolt 3 cables officially count too. It's getting hard to manage, but help is on the way.

If you recall my first cable post, there were 6 kinds of cables with USB-C plugs on both ends. I was also careful to preface that it was true as of USB Type-C™ Specification 1.4 on June 2019.

Last week, the USB-IF officially published the USB Type-C™ Specification Version Revision 2.0, August 29, 2019.

This is a major update to USB-C and contains required amendments to support the new USB4™ Spec.

One of those amendments? Introducing a new data rate, 20Gbps per lane, or 40Gbps total. This is called “USB4 Gen 3” in the new spec. One more data rate means the matrix of cables increases by a row, so we now have 8 C-to-C cable kinds, see Table 3-1:

Image

Listed, with new cables in bold: 1. USB 2.0 rated at 3A 2. USB 2.0 rated at 5A 3. USB 3.2 Gen 1 rated at 3A 4. USB 3.2 Gen 1 rated at 5A 5. USB 3.2 Gen 2 rated at 3A 6. USB 3.2 Gen 2 rated at 5A 7. USB4 Gen 3 rated at 3A 8. USB4 Gen 3 rated at 5A

New cables 7 and 8 have the same number of wires as cables 3 through 6, but are built to tolerances such that they can sustain 20Gbps per set of differential pairs, or 40Gbps for the whole cable. This is the maximum data rate in the USB4 Spec.

Also, please notice in the table above that (informative) maximum cable length shrinks as speed increases. Gen 1 cables can be 2M long, while Gen 3 cables can be 0.8m. This is just a practical consequence of physics and signal integrity when it comes to passive cables.

Data Rates

Data rates require some explanation too, as advancements since USB 3.1 means that the same physical cable is capable of way more when used in a USB4 system.

A USB 3.1 Gen 1 cable built and sold in 2015 would have been advertised to support 5Gbps operation in 2015. Fast forward to 2019 or 2020, that exact same physical cable (Gen 1), will actually allow you to hit 20gbps using USB4. This is due to advancements in the underlying phy on the host and client-side, but also because USB4 uses all 8 SuperSpeed wires simultaneously, while USB 3.1 only used 4 (single lane operation versus dual-lane operation).

The same goes for USB 3.1 Gen 2 cables, which would have been sold as 10gbps cables. They are able to support 20gbps operation in USB4, again, because of dual-lane.

Image

What about Thunderbolt 3 cables? Thunderbolt 3 cables physically look the same as a USB-C to USB-C cable and the passive variants of the cables mostly comply with the existing USB-C spec and could be regarded as USB-C cables of kinds 3 through 6, but Intel needed a way to mark some of their cables as 20Gbps and 40Gbps capable, years before USB-IF defined the Gen 3 data rate level. They did so using extra data objects in the Thunderbolt 3 cables' electronic marker, amounting to extra registers that mark the cable as 20Gbps and 40Gbps capable.

The good news is that since Intel decided to open up the Thunderbolt 3 spec, the USB-IF was able to completely take in and make Passive 20Gbps and 40Gbps Thunderbolt 3 cables supported by USB4 devices. A passive 40Gbps TBT3 cable you bought in 2016 or 2017 will just work at 40Gbps on a USB4 device in 2020.

How Linux USB PD and USB4 systems can help identify cables for users

By now, you are likely ever so confused by this mess of cable and data rate possibilities. The fact that I need a matrix and a decoder ring to explain the landscape of USB-C cables is a bad sign.

In the real world, your average user will pick a cable and will simply not be able to determine the capabilities of the cable by looking at it. Even if the cable has the appropriate logo to distinguish them, not every user will understand what the hieroglyphs mean.

Software, however, and Power Delivery may very well help with this. I've been looking very closely at the kernel's USB Type-C Connector Class.

The connector class creates the following structure in sysfs, populating these nodes with important properties queried from the cable, the USB-C port, and the port's partner:

/sys/class/typec/
/sys/class/typec/port0 <---------------------------Me
/sys/class/typec/port0/port0-partner/ <------------My Partner
/sys/class/typec/port0/port0-cable/ <--------------Our Cable
/sys/class/typec/port0/port0-cable/port0-plug0 <---Cable SOP'
/sys/class/typec/port0/port0-cable/port0-plug1 <---Cable SOP"

You may see where I'm going from here. Once user space is able to see what the cable and its e-marker chip has advertised, an App or Settings panel in the OS could tell the user what the cable is, and hopefully in clear language what the cable can do, even if the cable is unlabeled, or the user doesn't understand the obscure logos.

Lots of work remains here. The present Type-C Connector class needs to be synced with the latest version of the USB-C and PD spec, but this gives me hope that users will have a tool (any USB-C phone with PD) in their pocket to quickly identify cables.

 
Read more...

from metan's blog

What is wrong with sleep() then?

First of all this is something I had to fight off a lot and still have to from time to time. In most of the cases sleep() has been misused to avoid a need for a proper synchronization, which is wrong for at least two reasons.

The first is that it may and will introduce very rare test failures, that means somebody has to spend time looking into these, which is a wasted effort. Also I'm pretty sure that nobody likes tests that will fail rarely for no good reason. Even more so you cannot run such tests with a background load to ensure that everything works correctly on a bussy system, because that will increase the likehood of a failure.

The second is that this wastes resources and slowns down a test run. If you think that adding a sleep to a test is not a big deal, let me put things into a perspective. There is about 1600 syscall tests in Linux Test Project (LTP), if 7.5% of them would sleep just for one second, we would end up with two minutes of wasted time per testrun. In practice most of the test I've seen waited for much longer just to be sure that things will works even on slower hardware. With sleeps between 2 and 5 seconds that puts us somewhere between 4 and 10 minutes which is between 13% and 33% of the syscall runtime on my dated thinkpad, where the run finishes in a bit less than half an hour. It's even worse on newer hardware, because this slowdown will not change no matter how fast your machine is, which is maybe the reason why this was acceptable twenty years ago but it's not now.

When sleep() is acceptable then?

So far in my ten years of test development I met only a few cases where sleep() in a test code was appropriate. From the top of my head I remeber:

  • Filesystem tests for file timestamps, atime, mtime, etc.
  • Timer related tests where we sample timer in a loop
  • alarm() and timer_create() test where we wait for the timer to fire
  • Leap second tests

How to fix the problem?

Unfortunately there is no silver bullet since there are plenty of reasons for a race condition to happen and each class has to be dealt with differently.

Fortunately there are quite a few very common classes that could be dealt with quite easily. So in LTP we wrote a few synchronization primitives and helper functions that could be used by a test, so there is no longer any excuse to use sleep() instead.

The most common case was a need to synchronize between parent and child processes. There are actually two different cases that needed to be solved. First is a case where child has to execute certain piece of code before parent can continue. For that LTP library implements checkpoints with simple wait and wake functions based on futexes on a piece of shared memory set up by the test library. The second case is where child has to sleep in a syscall before parent can continue, for which we have a helper that polls /proc/$PID/stat. Also sometimes tests can be fixed just be adding a waitpid() in the parent which ensures that child is finished before parent runs.

There are other and even more complex cases where particular action is done asynchronously, or a kernel resource deallocation is deffered to a later time. In such cases quite often the best we can do is to poll. In LTP we ended up with a macro that polls by calling a piece of code in a loop with exponentially increasing sleeps between retries. Which means that instead of sleeping for a maximal time event can possibly take the sleep is capped by twice of the optimal sleeping time while we avoid polling too aggressively.

 
Read more...

from tglx

E-Mail interaction with the community

You might have been referred to this page with a form letter reply. If so the form letter has been sent to you because you sent e-mail in a way which violates one or more of the common rules of email communication in the context of the Linux kernel or some other Open Source project.

Private mail

Help from the community is provided as a free of charge service on a best effort basis. Sending private mail to maintainers or developers is pretty much a guarantee for being ignored or redirected to this page via a form letter:

  • Private e-mail does not scale Maintainers and developers have limited time and cannot answer the same questions over and over.

  • Private e-mail is limiting the audience Mailing lists allow people other than the relevant maintainers or developers to answer your question. Mailing lists are archived so the answer to your question is available for public search and helps to avoid the same question being asked again and again. Private e-mail is also limiting the ability to include the right experts into a discussion as that would first need your consent to give a person who was not included in your Cc list access to the content of your mail and also to your e-mail address. When you post to a public mailing list then you already gave that consent by doing so. It's usually not required to subscribe to a mailing list. Most mailing lists are open. Those which are not are explicitly marked so. If you send e-mail to an open list the replies will have you in Cc as this is the general practice.

  • Private e-mail might be considered deliberate disregard of documentation The documentation of the Linux kernel and other Open Source projects gives clear advice how to contact the community. It's clearly spelled out that the relevant mailing lists should always be included. Adding the relevant maintainers or developers to CC is good practice and usually helps to get the attention of the right people especially on high volume mailing lists like LKML.

  • Corporate policies are not an excuse for private e-mail If your company does not allow you to post on public mailing lists with your work e-mail address, please go and talk to your manager.

Confidentiality disclaimers

When posting to public mailing lists the boilerplate confidentiality disclaimers are not only meaningless, they are absolutely wrong for obvious reasons.

If that disclaimer is automatically inserted by your corporate e-mail infrastructure, talk to your manager, IT department or consider to use a different e-mail address which is not affected by this. Quite some companies have dedicated e-mail infrastructure to avoid this problem.

Reply to all

Trimming Cc lists is usually considered a bad practice. Replying only to the sender of an e-mail immediately excludes all other people involved and defeats the purpose of mailing lists by turning a public discussion into a private conversation. See above.

HTML e-mail

HTML e-mail – even when it is a multipart mail with a corresponding plain/text section – is unconditionally rejected by mailing lists. The plain/text section of multipart HTML e-mail is generated by e-mail clients and often results in completely unreadable gunk.

Multipart e-mail

Again, use plain/text e-mail and not some magic format. Also refrain from attaching patches as that makes it impossible to reply to the patch directly. The kernel documentation contains elaborate explanations how to send patches.

Text mail formatting

Text-based e-mail should not exceed 80 columns per line of text. Consult the documentation of your e-mail client to enable proper line breaks around column 78.

Top-posting

If you reply to an e-mail on a mailing list do not top-post. Top-posting is the preferred style in corporate communications, but that does not make an excuse for it:

A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

A: Top-posting. Q: What is the most annoying thing in e-mail?

A: No. Q: Should I include quotations after my reply?

See also: http://daringfireball.net/2007/07/on_top

Trim replies

If you reply to an e-mail on a mailing list trim unneeded content of the e-mail you are replying to. It's an annoyance to have to scroll down through several pages of quoted text to find a single line of reply or to figure out that after that reply the rest of the e-mail is just useless ballast.

Quoting code

If you want to refer to code or a particular function then mentioning the file and function name is completely sufficient. Maintainers and developers surely do not need a link to a git-web interface or one of the source cross-reference sites. They are definitely able to find the code in question with their favorite editor.

If you really need to quote code to illustrate your point do not copy that from some random web interface as that turns again into unreadable gunk. Insert the code snippet from the source file and only insert the absolute minimum of lines to make your point. Again people are able to find the context on their own and while your hint might be correct in many cases the issue you are looking into is root caused at a completely different place.

Does not work for you?

In case you can't follow the rules above and the documentation of the Open Source project you want to communicate with, consider to seek professional help to solve your problem.

Open Source consultants and service providers charge for their services and therefore are willing to deal with HTML e-mail, disclaimers, top-posting and other nuisances of corporate style communications.

 
Read more...

from tglx

E-Mail interaction with the community

You might have been referred to this page with a form letter reply. If so the form letter has been sent to you because you sent e-mail in a way which violates one or more of the common rules of email communication in the context of the Linux kernel or some other Open Source project.

Private mail

Help from the community is provided as a free of charge service on a best effort basis. Sending private mail to maintainers or developers is pretty much a guarantee for being ignored or redirected to this page via a form letter:

  • Private e-mail does not scale Maintainers and developers have limited time and cannot answer the same questions over and over.

  • Private e-mail is limiting the audience Mailing lists allow people other than the relevant maintainers or developers to answer your question. Mailing lists are archived so the answer to your question is available for public search and helps to avoid the same question being asked again and again. Private e-mail is also limiting the ability to include the right experts into a discussion as that would first need your consent to give a person who was not included in your Cc list access to the content of your mail and also to your e-mail address. When you post to a public mailing list then you already gave that consent by doing so. It's usually not required to subscribe to a mailing list. Most mailing lists are open. Those which are not are explicitly marked so. If you send e-mail to an open list the replies will have you in Cc as this is the general practice.

  • Private e-mail might be considered deliberate disregard of documentation The documentation of the Linux kernel and other Open Source projects gives clear advice how to contact the community. It's clearly spelled out that the relevant mailing lists should always be included. Adding the relevant maintainers or developers to CC is good practice and usually helps to get the attention of the right people especially on high volume mailing lists like LKML.

  • Corporate policies are not an excuse for private e-mail If your company does not allow you to post on public mailing lists with your work e-mail address, please go and talk to your manager.

Confidentiality disclaimers

When posting to public mailing lists the boilerplate confidentiality disclaimers are not only meaningless, they are absolutely wrong for obvious reasons.

If that disclaimer is automatically inserted by your corporate e-mail infrastructure, talk to your manager, IT department or consider to use a different e-mail address which is not affected by this. Quite some companies have dedicated e-mail infrastructure to avoid this problem.

Reply to all

Trimming Cc lists is usually considered a bad practice. Replying only to the sender of an e-mail immediately excludes all other people involved and defeats the purpose of mailing lists by turning a public discussion into a private conversation. See above.

HTML e-mail

HTML e-mail – even when it is a multipart mail with a corresponding plain/text section – is unconditionally rejected by mailing lists. The plain/text section of multipart HTML e-mail is generated by e-mail clients and often results in completely unreadable gunk.

Multipart e-mail

Again, use plain/text e-mail and not some magic format. Also refrain from attaching patches as that makes it impossible to reply to the patch directly. The kernel documentation contains elaborate explanations how to send patches.

Text mail formatting

Text-based e-mail should not exceed 80 columns per line of text. Consult the documentation of your e-mail client to enable proper line breaks around column 78.

Top-posting

If you reply to an e-mail on a mailing list do not top-post. Top-posting is the preferred style in corporate communications, but that does not make an excuse for it:

A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

A: Top-posting. Q: What is the most annoying thing in e-mail?

A: No. Q: Should I include quotations after my reply?

See also: http://daringfireball.net/2007/07/on_top

Trim replies

If you reply to an e-mail on a mailing list trim unneeded content of the e-mail you are replying to. It's an annoyance to have to scroll down through several pages of quoted text to find a single line of reply or to figure out that after that reply the rest of the e-mail is just useless ballast.

Quoting code

If you want to refer to code or a particular function then mentioning the file and function name is completely sufficient. Maintainers and developers surely do not need a link to a git-web interface or one of the source cross-reference sites. They are definitely able to find the code in question with their favorite editor.

If you really need to quote code to illustrate your point do not copy that from some random web interface as that turns again into unreadable gunk. Insert the code snippet from the source file and only insert the absolute minimum of lines to make your point. Again people are able to find the context on their own and while your hint might be correct in many cases the issue you are looking into is root caused at a completely different place.

Does not work for you?

In case you can't follow the rules above and the documentation of the Open Source project you want to communicate with, consider to seek professional help to solve your problem.

Open Source consultants and service providers charge for their services and therefore are willing to deal with HTML e-mail, disclaimers, top-posting and other nuisances of corporate style communications.

 
Read more...

from metan's blog

We are trying to resurrect the automated testing track on FOSDEM along with Anders and Matt from Linaro. Now that will not happen and even does not make sense to happen without you. I can talk and drink beer in the evening with Matt and Anders just fine, we do not need a devroom for that.

We also have a few people prepared to give a talks on various topics but not nearly enough to fill in and justify a room for a day. So if you are interested in attending or even better giving a talk or driving a discussion please let us know.

 
Read more...

from joelfernandes

Note: At the time of this writing, it is kernel v5.3 release. RCU moves fast and can change in the future, so some details in this article may be obsolete.

The RCU subsystem and the task scheduler are inter-dependent. They both depend on each other to function correctly. The scheduler has many data structures that are protected by RCU. And, RCU may need to wake up threads to perform things like completing grace periods and callback execution. One such case where RCU does a wake up and enters the scheduler is rcu_read_unlock_special().

Recently Paul McKenney consolidated RCU flavors. What does this mean?

Consider the following code executing in CPU 0:

preempt_disable();
rcu_read_lock();
rcu_read_unlock();
preempt_enable();

And, consider the following code executing in CPU 1:

a = 1;
synchronize_rcu();  // Assume synchronize_rcu
                    // executes after CPU0's rcu_read_lock
b = 2;

CPU 0's execution path shows 2 flavors of RCU readers, one nested into another. The preempt_{disable,enable} pair is an RCU-sched flavor RCU reader section, while the rcu_read_{lock,unlock} pair is an RCU-preempt flavor RCU reader section.

In older kernels (before v4.20), CPU 1's synchronize_rcu() could return after CPU 0's rcu_read_unlock() but before CPU 0's preempt_enable(). This is because synchronize_rcu() only needs to wait for the “RCU-preempt” flavor of the RCU grace period to end.

In newer kernels (v4.20 and above), the RCU-preempt and RCU-sched flavors have been consolidated. This means CPU 1's synchronize_rcu() is guaranteed to wait for both of CPU 1's rcu_read_unlock() and preempt_enable() to complete.

Now, lets get a bit more detailed. That rcu_read_unlock() most likely does very little. However, there are cases where it needs to do more, by calling rcu_read_unlock_special(). One such case is if the reader section was preempted. A few more cases are:

  • The RCU reader is blocking an expedited grace period, so it needed to report a quiescent state quickly.
  • The RCU reader is blocking a grace period for too long (~100 jiffies on my system, that's the default but can be set with rcutree.jiffies_till_sched_qs parameter).

In all these cases, the rcu_read_unlock() needs to do more work. However, care must be taken when calling rcu_read_unlock() from the scheduler, that's why this article on scheduler deadlocks.

One of the reasons rcu_read_unlock_special() needs to call into the scheduler is priority de-boosting: A task getting preempted in the middle of an RCU read-side critical section results in blocking the completion of the critical section and hence could prevent current and future grace periods from ending. So the priority of the RCU reader may need to be boosted so that it gets enough CPU time to make progress, and have the grace period end soon. But it also needs to be de-boosted after the reader section completes. This de-boosting happens by calling of the rcu_read_unlock_special() function in the outer most rcu_read_unlock().

What could go wrong with the scheduler using RCU? Let us see this in action. Consider the following piece of code executed in the scheduler:

  reader()
	{
		rcu_read_lock();
		do_something();     // Preemption happened
                /* Preempted task got boosted */
		task_rq_lock();     // Disables interrupts
                rcu_read_unlock();  // Need to de-boost
		task_rq_unlock();   // Re-enables interrupts
	}

Assume that the rcu_read_unlock() needs to de-boost the task's priority. This may cause it to enter the scheduler and cause a deadlock due to recursive locking of RQ/PI locks.

Because of these kind of issues, there has traditionally been a rule that RCU usage in the scheduler must follow:

“Thou shall not hold RQ/PI locks across an rcu_read_unlock() if thou not holding it or disabling IRQ across both both the rcu_read_lock() + rcu_read_unlock().”

More on this rule can be read here as well.

Obviously, acquiring RQ/PI locks across the whole rcu_read_lock() and rcu_read_unlock() pair would resolve the above situation. Since preemption and interrupts are disabled across the whole rcu_read_lock() and rcu_read_unlock() pair; there is no question of task preemption.

Anyway, the point is rcu_read_unlock() needs to be careful about scheduler wake-ups; either by avoiding calls to rcu_read_unlock_special() altogether (as is the case if interrupts are disabled across the entire RCU reader), or by detecting situations where a wake up is unsafe. Peter Ziljstra says there's no way to know when the scheduler uses RCU, so “generic” detection of the unsafe condition is a bit tricky.

Now with RCU consolidation, the above situation actually improves. Even if the scheduler RQ/PI locks are not held across the whole read-side critical sectoin, but just across that of the rcu_read_unlock(), then that itself may be enough to prevent a scheduler deadlock. The reasoning is: during the rcu_read_unlock(), we cannot yet report a QS until the RQ/PI lock is itself released since the act of holding the lock itself means preemption is disabled and that would cause a QS deferral. As a result, the act of priority de-boosting would also be deferred and prevent a possible scheduler deadlock.

However, RCU consolidation introduces even newer scenarios where the rcu_read_unlock() has to enter the scheduler, if the “scheduler rules” above is not honored, as explained below:

Consider the previous code example. Now also assume that the RCU reader is blocking an expedited RCU grace period. That is just a fancy term for a grace period that needs to end fast. These grace periods have to complete much more quickly than normal grace period. An expedited grace period causes currently running RCU reader sections to receive IPIs that set a hint. Setting of this hint results in the outermost rcu_read_unlock() calling rcu_read_unlock_special(), which otherwise would not occur. When rcu_read_unlock_special() gets called in this scenario, it tries to get more aggressive once it notices that the reader has blocked an expedited RCU grace period. In particular, it notices that preemption is disabled and so the grace period cannot end due to RCU consolidation. Out of desperation, it raises a softirq (raise_softirq()) in the hope that the next time the softirq runs, the grace period could be ended quickly before the scheduler tick occurs. But that can cause a scheduler deadlock by way of entry into the scheduler due to a ksoftirqd-wakeup.

The cure for this problem is the same, holding the RQ/PI locks across the entire reader section results in no question of a scheduler related deadlock due to recursively acquiring of these locks; because there would be no question of expedited-grace-period IPIs, hence no question of setting of any hints, and hence no question of calling rcu_read_unlock_special() from scheduler code. For a twist of the IPI problem, see special note.

However, the RCU consolidation throws yet another curve ball. Paul McKenney explained on LKML that there is yet another situation now due to RCU consolidation that can cause scheduler deadlocks.

Consider the following code, where previous_reader() and current_reader() execute in quick succession in the context of the same task:

       previous_reader()
	{
		rcu_read_lock();
		do_something();      // Preemption or IPI happened
		local_irq_disable(); // Cannot be the scheduler
		do_something_else();
		rcu_read_unlock();  // As IRQs are off, defer QS report
                                    //but set deferred_qs bit in 
                                    //rcu_read_unlock_special
		do_some_other_thing();
		local_irq_enable();
	}

        // QS from previous_reader() is still deferred.
	current_reader() 
	{
		local_irq_disable();  // Might be the scheduler.
		do_whatever();
		rcu_read_lock();
		do_whatever_else();
		rcu_read_unlock();    // Must still defer reporting QS
		do_whatever_comes_to_mind();
		local_irq_enable();
	}

Here previous_reader() had a preemption; even though the current_reader() did not – but the current_reader() still needs to call rcu_read_unlock_special() from the scheduler! This situation would not happen in the pre-consolidated-RCU world because previous_reader()'s rcu_read_unlock() would have taken care of it.

As you can see, just following the scheduler rule of disabling interrupts across the entire reader section does not help. To detect the above scenario; a new bitfield deferred_qs has been added to the task_struct::rcu_read_unlock_special union. Now what happens is, at rcu_read_unlock()-time, the previous reader() sets this bit, and the current_reader() checks this bit. If set, the call to raise_softirq() is avoided thus eliminating the possibility of a scheduler deadlock.

Hopefully no other scheduler deadlock issue is lurking!

Coming back to the scheduler rule, I have been running overnight rcutorture tests to detect if this rule is ever violated. Here is the test patch checking for the unsafe condition. So far I have not seen this condition occur which is a good sign.

I may need to check with Paul McKenney about whether proposing this checking for mainline is worth it. Thankfully, LPC 2019 is right around the corner! ;–)


Special Note

[1] The expedited IPI interrupting an RCU reader has a variation. For an example see below where the IPI was not received, but we still have a problem because the ->need_qs bit in the rcu_read_unlock_special union got set even though the expedited grace period started after IRQs were disabled. The start of the expedited grace period would set the rnp->expmask bit for the CPU. In the unlock path, because the ->need_qs bit is set, it will call rcu_read_unlock_special() and risk a deadlock by way of a ksoftirqd wakeup because exp in that function is true.

CPU 0                         CPU 1
preempt_disable();
rcu_read_lock();

// do something real long

// Scheduler-tick sets
// ->need_qs as reader is
// held for too long.

local_irq_disable();
                              // Expedited GP started
// Exp IPI not received
// because IRQs are off.

local_irq_enable();

// Here rcu_read_unlock will
// still call ..._special()
// as ->need_qs got set.
rcu_read_unlock();

preempt_enable();

The fix for this issue is the same as described earlier, disabling interrupts across both rcu_read_lock() and rcu_read_unlock() in the scheduler path.

 
Read more...

from metan's blog

The problem

More than ten years ago even consumer grade hardware started to have two and more CPU cores. These days you can easily buy PC with eight cores even in the local computer shops. The hardware has envolved quite fast during the last decade, but the software that does kernel testing didn't. These days we still run LTP testcases sequentially even on beefy servers with 64+ cores and terabytes of RAM. That means that syscalls testrun takes 30+ minutes while it could be reduced to less than 10 minutes, which will significantly shorten the feedback loop for any continous integration.

The naive solution

The naive solution is obviously to run $NCPU+1 tests in parallel. But there is a catch, some of the tests utilize global system resources or state and running two such tests in parallel would lead to false negatives. If you think that this situation is rare you are mistaken, there are plenty of tests that needs a block device, change system wall clock, sample kernel timers in a loop, play with SysV IPC, networking, etc. In short we will end up with many, hard to reproduce, race conditions and mostly useless results.

The proper solution

The proper solution would require:

  1. Make tests declare which resources they utilize
  2. Export this data to the testrunner
  3. Make the testrunner use that information when running tests

Happily the first part is mostly done for LTP, we have switched to new test library I've written a few years ago. The new library defines a “driver model” for a testcase. At this time about half of the tests are converted to the new library and the progress is steady.

In the new library the resources a test needs are listed in a declarative way as a C structure, so for most cases we already have this information in place. If you want to know more you can have a look at our documentation.

At this point I'm trying to solve the second part, which is making the data available to the testrunner, which mostly works at this point. Once this part is finished the missing piece would be writing a scheduller that takes this information into an account.

Unfortunately the way LTP runs test is also dated, there is an old runltp script which I would call an ugly hack at best. Adding feature such as parallel test run to that code is close to impossible. Which is the reason I've started to work on a new LTP testrunner, which I guess could be a subject for a second blog post.

 
Read more...

from Konstantin Ryabitsev

After my trusty Pebble 2 died about 6 months ago, I needed some kind of replacement that would do the following:

  1. buzz my wrist and show me alerts from any app (not just calls/texts)
  2. have a long-lasting battery without being huge
  3. count my daily steps and prod me when I haven't moved for a while
  4. not spy on me continuously and feed my data to a shady entity

The solution I settled on was an Amazfit Bip. It does almost all of the above:

  1. it offers Bluetooth LE with full notifications integration
  2. the battery lasts about a month (!) — my biggest problem is actually finding where the heck I put the charger, since I use it so rarely
  3. it has a step/heartbeat/sleep tracker

It also costs about US$80.

Now, the default smartphone app that comes with it doesn't particularly inspire confidence regarding that point #4 in my requirements list. I'm not trying to accuse anyone of anything, but I am not entirely brimming with confidence that the abundant personal data it collects about me is never going to be used for nefarious purposes.

The good news is that Amazfit Bip is fully supported by Gadgetbridge, which is a free software application installable via F-Droid. The version of Amazfit Bip that I got 6 months ago required a firmware update to work with Gadgetbridge, which required that I installed the Amazfit manufacturer app in order to upgrade it (which I did from one of the old junker phones I have lying around). However, after that I was able to pair it with Gadgetbridge on multiple phones. It is also not necessary to use the official app for the initial step, but the alternative looked more complicated than just using a junker phone to shortcut the process.

In the end, I spent $80 and a couple of hours to get a wrist gadget that does all I need, fits well, and doesn't spy on me. Freeyourgadget has a lot more info if you're interested.

 
Read more...

from mcgrof

I'm announcing the release of kdevops which aims at making setting up and testing the Linux kernel for any project as easy as possible. Note that setting up testing for a subsystem and testing a subsystem are two separate operations, however we strive for both. This is not a new test framework, it allows you to use existing frameworks, and set those frameworks up as easily can humanly be possible. It relies on a series of modern hip devops frameworks, it relies on ansible, vagrant and terraform, ansible roles through the Ansible Galaxy, and terraform modules.

Three example demo projects are released which demo it's use:

  • kdevops – skeleton generic example using linux-stable
  • fw-kdevops – used for testing firmware loading using linux-next. This example demo was written in about one hour tops by forking kdevops, trimming it, adding a new ansible galaxy for selftests. You are expected to be able to fork it and add your respective kernel selftest fork in a minute
  • oscheck – actively being used to test and advance the XFS filesystem for stable kernel releases. If you fork this to try to add support for testing a new filesystem under a new project, please let me know how long it took you to do that.

Fancy pictures in a nutshell

Of course you just want pictures and the ability to go home after seeing them. Should these be on instagram as well? Gosh.

A first run of kdevops

On a first run:

Running the bootlinux role on just one host

Example run of just running the ansible bootlinux role on just one host:

End of running the bootlinx ansible role on just one host

This shows what it looks like at the end of running the ansible bootlinux role after the host has booted into the new shiny kernel:

Logging into test test systems

Well, since we set up your ~/ssh/.config for you, all you gotta do now is just ssh in to the target host you want to test, it will already have the shiny new kernel installed and booted into it:

Motivations for kdevops

Below I'll document just a bit of the motivation behind this project. The documentation and demo projects should hopefully suffice for how to use all this.

Testing ain't easy, brah!

Getting contributors to your subsystem / driver in Linux is wonderful, however ensuring it doesn't break anything is a completely separate matter. It is my belief that testing a patch to ensure no testable regressions exist should be painless, and simple, however that has never been the case.

Testing frameworks ain't easy to setup, brah!

Linux kernel testing frameworks should also be really easy to set up. But that is typically never the case either. One example case of complexity in setting a test framework is fstests used to tests Linux kernel filesystems, and to ensure to the best of our ability that a new patch doesn't regress the kernel against a baseline. But wait, what is the baseline?

Setting up test systems ain't easy to ramp up, brah!

Another difficulty with testing the Linux kernel comes with the fact that you don't want to test the kernel on same kernel you're laptop is running on, otherwise you'd crash it, and if you're testing filesystems you may even end up corrupting your filesystem. So typically folks end up using virtualization technologies to setup virtual machines, boot into them, and then use the virtualized hosts as test vehicles. Another alternative is to use cloud service providers such as OpenStack, Azure, Amazon Web Services, Google Cloud Compute to create hosts on the cloud and use these instead. But I've heard complaints about how even setting up KVM can be complex, even from kernel developers! Even some kernel developers don't want to know how to set up a virtual environment to test things.

I hear ya, brah!

My litmus test for a full set up complexity is all the work required to setup fstests to test a Linux filesystem. If a solution for all the woes above were to ever be provided, I figured it'd have to allow to you easily setup fstests to test XFS without you doing much work.

I started looking into this effort first by trying to provide my own set of wrappers around KVM to let you easily setup KVM. Then I extended this effort to easily setup fstests. Both efforts were all shell hacks... It worked for me, but I was still not really happy with it all. It seemed hacky.

Ted Ts'o's xfstests-bld.git provided a cloud environment solution for using setting up fstests on Google Cloud Compute for ext filesystemes (ext2, ext3, ext4), however I was not satisfied with this given I wanted it easy to allow you to test any filesystem, and be Cloud provider agnostic.

ansible provides a proper replacement for shell hacks, in a distribution agnostic manner, and even OS agnostic manner. Vagrant lets me replace all those terrible original bash hacks to setup KVM with simple elegant descriptions of what I want a set of target set of hosts to look like. It also lets me support not only KVM but also Virtualbox, and even support Mac OS X. Terraform accomplishes the same but for cloud environments, and supports different providers.

Feedback and rants welcomed

So, give the repositories a shot, I welcome feedback and rants.

kdevops is intended to be used as the de-facto example for all of the ansible roles, and terraform modules.

fw-kdevops is intended to be forked by folks wanting a simple two host test setup where all you need is linux-next and to run selftests.

oscheck is already actively used to help advance XFS on the stable kernel releases, and is intended to be forked by folks who want to use fstests to test any filesystem on any kernel release.

 
Read more...

from Greg Kroah-Hartman

As I had this asked to me 3 times today (once in irc, and twice in email), no, the 5.3 kernel release is NOT the next planned Long Term Supported (LTS) release.

I've been saying for a few years now that I would pick the “last released” kernel of the year to be the next LTS release. And as per the wonderful pointy-hair-crystal-ball, that looks to be the 5.4 kernel release this year.

So, count on it being 5.4, unless something really bad happens in that release, such as people throwing in loads of crud because they “need” it for the LTS release. If that happens again, I'll just have to pick a different release...

 
Read more...

from Konstantin Ryabitsev

If you need to run a CentOS/RHEL 7 system where GnuPG is stuck on version 2.0, you can use the gnupg22-static package I'm maintaining for our own needs at Fedora COPR.

It installs into /opt/gnupg22 so it doesn't clash with the version of GnuPG installed and used by the system.

To start using the gnupg22-static binaries, you will need to first enable the COPR repository:

# yum install yum-plugin-copr
# yum copr enable icon/lfit
# yum install gnupg22-static

The static compilation process is not perfect, because it hardcodes some defaults to point to the buildroot locations (which don't exist in an installed RPM), so you will need to tell your gpg binary where its auxiliary programs live by adding the file called ~/.gnupg/gpg.conf-2.2 with the following content:

agent-program   /opt/gnupg22/bin/gpg-agent
dirmngr-program /opt/gnupg22/bin/dirmngr

Now you just need to add a couple of aliases to your ~/.bash_profile:

  alias gpg="/opt/gnupg22/bin/gpg"
  alias gpg2="/opt/gnupg22/bin/gpg"

Alternatively, you can list /opt/gnupg22/bin earlier in the path:

export PATH=/opt/gnupg22/bin:$PATH

You should now be able to enjoy GnuPG-2.2 features such as support for ECC keys and Web Key Directories.

 
Read more...

from Benson Leung

This issue came up recently for a high profile new gadget that has made the transition from Micro-USB to USB-C in its latest version, the Raspberry Pi 4. See the excellent blog post by Tyler (aka scorpia): https://www.scorpia.co.uk/2019/06/28/pi4-not-working-with-some-chargers-or-why-you-need-two-cc-resistors/

The short summary is that bad things (no charging) happens if the CC1 and CC2 pins are shorted together anywhere in a USB-C system that is not an audio accessory. When combined with more capable cables (handling SuperSpeed data, or 5A power) this configuration will cause compliant chargers to provide 0V instead of 5V to the Pi.

The Raspberry Pi folks made a very common USB-C hardware design mistake that I have personally encountered dozens of times in prototype hardware and in real gear that was sold to consumers.

What this unique about this case is that Raspberry Pi has posted schematics (thanks open hardware!) of their board that very clearly show the error.

Image

Excerpt from the reduced Pi4 Model B schematics, from https://www.scorpia.co.uk/wp-content/uploads/2019/06/image-300x292.png

Both of the CC pins in the Pi4 schematic above are tied together on one end of resistor R79, which is a 5.1 kΩ pulldown.

Contrast that to what the USB Type-C Specification mandates must be done in this case.

Image

USB Type-C's Sink Functional Model for CC1 and CC2, from USB Type-C Specification 1.4, Section 4.5.1.3.2

Each CC gets its own distinct Rd (5.1 kΩ), and it is important that they are distinct.

The Raspberry Pi team made two critical mistakes here. The first is that they designed this circuit themselves, perhaps trying to do something clever with current level detection, but failing to do it right. Instead of trying to come up with some clever circuit, hardware designers should simply copy the figure from the USB-C Spec exactly. The Figure 4–9 I posted above isn't simply a rough guideline of one way of making a USB-C receptacle. It's actually normative, meaning mandatory, required by the spec in order to call your system a compliant USB-C power sink. Just copy it.

The second mistake is that they didn't actually test their Pi4 design with advanced cables. I get it, the USB-C cable situation is confusing and messy, and I've covered it in detail here that there are numerous different cables. However, cables with e-marker chips (the kind that would cause problems with Pi4's mistake) are not that uncommon. Every single Apple MacBook since 2016 has shipped with a cable with an e-marker chip. The fact that no QA team inside of Raspberry Pi's organization caught this bug indicates they only tested with one kind (the simplest) of USB-C cable.

Raspberry Pi, you can do better. I urge you to correct your design as soon as you can so you can be USB-C compliant.

 
Read more...

from Konstantin Ryabitsev

Imagine you are at a conference somewhere and a person you run across tells you about a project that you find interesting.

“Here,” they say, “I can share it with you if you've got Decent. I think it's a few days stale, sorry — roaming costs here are crazy and I don't trust the hotel wifi.”

You do have the Decent app on your phone, so you scan the QR code and wait until your phone shows you the usual “replica complete” checkmark. You don't even know if your phones used NFC, Bluetooth, or WiFi for this little chat (probably not Wifi, because it kinda sucks at all conferences), but the data went straight from their phone to yours without needing to hit any other systems on the net.

When you get to your hotel room, you decide to check out the project details. You open the Decent app on your phone and it shows a short ID code (xPqat3z). You can use it to replicate the project straight from your phone. You open up your travel laptop and run:

$ decent replicate xPqat3z
Looking for xPqat3z...found project "fizzbuzz" on "My Phone".
Cloning git...done.
Replicating ssb chains...done.

Since both your laptop and your phone are Bluetooth-paired, you are able to grab the project straight from your phone without having to hit the net again. You poke around the git tree, try to get things to work, but something is not quite right. You do have a fairly old version of libsnafu installed, so maybe that's the cause?

You run “decent tui” and search for “libsnafu”, which shows that someone has hit that very same problem just two days ago and opened a new issue, but there is no follow up yet.

Or is there?

You exit the tui and run:

[~/fizzbuzz.git]$ decent pull
Found .decent with 5 pub servers.
Establishing the fastest pub server to use...done.
Joining pub.kernel.org:80 using the invite code...done.
Updating git...1c8b2e7..620b5e2...done.
Updating ssb chains...done.
- 1 new participant
- 15 new conversations
- 12 new commits in 2 branches
- 15 patch updates (2 new)
- 9 issue updates (1 new)

Ding, when you view the libsnafu issue again you see that there have been new updates since it was created 2 days ago (the person you replicated from did say their replica was a bit stale). There is even a proposed patch that is supposed to fix the library compatibility problem.

You hit “enter” on the patch to review it. Seems like a straightforward fix, and you're happy to see that there is already a couple of Tested-by from the usual CI bots, and a Reviewed-by from Taylor Thompson, the person you spoke with just earlier today — in fact, this Reviewed-by has a timestamp of only a few minutes ago. You guess Taylor is catching up on some work before dinner as well.

You type in “:apply snafutest” and decent automatically creates a snafutest branch, checks it out, and applies the proposed patch on top of it. Presto, fizzbuzz finally builds and works for you.

Being a good citizen, you decide to comment on the issue and add your own Tested-by. Since it's your first time participating in this project, you need to join first:

[~/fizzbuzz.git]$ decent join
Creating new SSB keypair...done.
Starting SSB replication agent...done.
Your name [Alex Anderson]: Alex Anderson
Device identifier: Travel laptop
Self-identifying as Alex Anderson (Travel Laptop)
(Required) Agree with terms stated in COPYING? [y/n/view]: y
Adding COPYING agreement record...done.
(Required) Agree with terms stated in COVENANT? [y/n/view]: y
Adding COVENANT agreement record...done.
Cross-certify? [SSB/PGP/Keybase/None]: PGP
Adding PGP public key record...done.
Adding signed feed-id record...
Enter PGP key passphrase: *********
Done.

Now that you've initialized your own developer chain, you can comment on the issue. You give it a thumbs-up, add your own Tested-by to the proposed patch, and join the #fizzbuzz-users and #fizzbuzz-dev channels. All of these actions are simply added records to your local SSB feed, which gets replicated to the pub server you'd joined earlier.

Other members of the project will automatically start getting your SSB feed updates either from the pub server they joined, or from other developers they are following. If a pub server becomes unavailable, anyone who's ever run “decent pull” will have replicas of all participating developer and bot feeds (which means full copies of all issues, patches, developer discussions, and CI reports — for the entirety of the project's existence). They can switch to a different pub server, set up their own, or just replicate between developers using the SSB gossip protocol that powers it all behind the scenes.

What is this magic?

The “decent” tool is fiction, but the SSB framework I'm describing is not. SSB stands for “Secure Scuttlebutt” (it's nautical slang for “gossip,” so please stop guffawing). SSB is a distributed gossip protocol that is built on the concept of replicating individual “sigchains,” which are very similar in concept to git. Each record references the hash of the previous record, plus SSB uses an ECC key to cryptographically sign every new entry, such that the entire chain is fully verifiable and attestable. Unless someone has access to the ECC secret key created at the beginning of the SSB chain, they would not be able to add new entries — and unless the chain has never been replicated anywhere, all entries are immutable (or the replication simply breaks if any of the existing records in it are modified).

The sigchains are only part of the story — SSB also offers a decentralized replication protocol that works hard to make sure that there is no single point of trust and no single point of failure. It is able to replicate using “pub” servers that merely work as convenient mediators, but are unnecessary for the overall health of the SSB fabric. SSB replication can be done peer-to-peer via local network, over the Internet, via Tor, sneakernet, or anything at all that is able to send and receive bits.

The end-tool on the client uses these individual feeds to assemble a narrative, using message-id cross-references to construct threads of conversations. SSB is envisioned as a fully-private and fully-decentralized social network where each participating individual shares an immutable activity record choosing how much to share publicly, how much to share with specific individuals, and how much to keep fully private.

I suggest we co-opt SSB for free software development to make it truly decentralized, self-archiving, and fully attestable in all developer interactions.

What problem are you solving?

If you've read my previous entries, you know that we've been working hard to archive and preserve mailing list discussions on lore.kernel.org. Mailing lists have served us well, but their downsides are very obvious:

  • email started out as decentralized, but the vast majority of it now flows through the same handful of providers (Google, Amazon, Microsoft). It's becoming more and more difficult to set up your own mail server and expect that mail you send out will be properly accepted and delivered by the “big guns.” Did you set up SPF? DKIM? DMARC? ARC-Seal? Is your IP blacklisted on RBL? Are you sure? Have you checked today?
  • Receiving email is also frustrating regardless whether you are using your own mail server or relying on one of the “big-gun” providers. If you're not running a spamchecker, you are probably wasting some part of your day dealing with spam. If you do filter your mail, then I hope you check your spam folder regularly (in which case you are still wasting the same amount of time on it, just more infrequently and in longer chunks at once).
  • Mailing list servers are single points of failure that have to send out amazing amounts of redundant data to all subscribers. If a mailing list provider becomes unavailable, this basically kills all project discussions until a new mailing list is set up and everyone re-subscribes. Usually, this also results in the loss of previous archives, because everyone assumes someone else has a full copy.
  • Mailing lists are lossy. If your mail starts bouncing for some reason (e.g. due to a full inbox), you usually end up unsubscribed and miss out on potentially important conversations. Unless you go back and check the archives, you may never become aware of what you missed.
  • Mail clients routinely mangle structured data. Anyone who's ever had to send out a patch is aware of the long-ish “how to configure your mail client so it doesn't mangle patches” section in the git docs.
  • Even if you do manage to successfully send patches, sending any other kind of structured data is the wild west. Bot reports, automated issue notifications, etc, attempt to present data as both human- and machine-readable and largely fail at both.
  • Everyone has pretty much given up on email encryption and attestation. PGP signatures in email are mostly treated like noise, because all clients kinda suck at PGP, and, more importantly, meaningful trust delegation is hard.

Duh, that's why nobody uses email

Extremely few projects still use email for software development. The Kernel is obviously an important exception to this, among a few others, and it's usually the kind of thing people like to mention to point out how behind-the-times kernel developers are. They should stop acting like such dinosaurs, get with the program and just start using Git..b already!

However, using Git..b obviously introduces both a single point of failure and a single point of trust. Git repositories may be decentralized, but commits are merely the final product of a lot of developer back-and-forth that ends up walled-in inside the beautiful Git..b garden. You can export your project from Git..b, but very few people bother to do so, and almost nobody does it on a regular basis.

If a maintainer steps away and all development moves to a different fork, the project loses part of its history that is not committed to git, because all its issues, CI test results, pull requests and conversations are now split between the old fork and the new fork. If the original developer has a personal crisis and purges their original repository, that part of the project history is now forever gone, even if the code remains.

Furthermore, if you've been around for a while, you've seen beautiful gardens come and go. Before Github there was Sourceforge, which at some point poisoned its beautiful wells by bundling adware with binary downloads. Google Code has come and gone, like most Google things do. Github has seen a significant exodus of projects to Gitlab after it got acquired by Microsoft, and there's certainly no guarantee that Gitlab won't be acquired by some other $TechGiant looking to spruce up its open-source community image.

Git is decentralized and self-archiving. Mailing lists... sort-of are — at least we are trying to keep them that way, but it's becoming more and more difficult. Even those projects that use mailing lists for patches may not use them for issue tracking or CI reports (for example, not all Bugzilla activity goes to mailing lists and Patchwork allows attaching CI reports directly to patches using its REST API).

I think it's way past due time for us to come up with a solution that would offer decentralized, self-archiving, fully attestable, “cradle-to-grave” development platform that covers all aspects of project development and not just the code. It must move us away from mailing lists, but avoid introducing single points of trust, authority, and failure.

And you think SSB is it?

I believe SSB offers us a usable framework that we can build on to achieve this goal. The concept of sigchains is very easy to convey due to their close resemblance to git, and the protocol's decentralized, mesh-like P2P replication nature is an important feature that will help us avoid introducing single points of failure. Every participant receives the full history of the project, same as currently every participant receives the full history of the project's code when they clone the git repository.

In SSB, every sigchain (“feed”) is tied to a single identity, which is usually a device belonging to a real person or it can be a project-specific feed used by a bot. Developers would naturally have multiple identities (“work laptop”, “phone”, “travel laptop”) and they can add new ones and abandon old ones as they add work environments or lose access to old devices. The feeds can be authenticated by each individual developer by cross-signing them with another identity framework (Keybase, PGP, etc), or they can remain fully pseudonymous.

The important part here is that once an identity is established, all records created by that identity are attestable to the same person or entity that were in the possession of the private ECC key at the time when that feed was created. When a maintainer applies someone's patches to their git tree, simply referencing the SBB record-id of the patch as part of the commit message is enough to provide a full immutable attestation chain for that code. It's like Signed-off-by on very powerful drugs.

Spammy feeds can be excluded using a blocklist file in the project repository, or a project can choose to have an allowlist explicitly listing authorized feeds (as long as they provide instructions on how to request addition to that list for the purposes of participation). Developers violating established community guidelines can be terminated from the project by adding a record indicating that their feeds should be replicated up to a specific entry and no further.

Since SSB relies on cryptographic keypairs by design, it is easy to set up fully private discussion groups that are end-to-end encrypted to all group participants. This makes it easy to discuss sensitive subjects like security vulnerabilities without needing to rely on any other means of communication or any other privacy tools outside of what is already provided by SSB.

(We'll ignore for the moment the fact that all implementations of SSB are written in Javascript/NPM — wait, don't go, hear me out! — since it uses standard crypto and the records themselves are json, everything about SSB is easily portable.)

Won't it raise the entry barrier?

I am acutely aware that such system would significantly raise the participation barrier. It's one thing to open an issue or send a pull request on Git..b, attach a patch to a Bugzilla entry, or send an email to a mailing list. We cannot expect that a “drive-by” contributor would install a client tool, replicate potentially tens of gigabytes of individual developer feeds, and create their own SSB identity simply to open an issue or submit a single patch. We would need full-featured web clients that would allow someone to browse projects in a similar fashion as they would browse them on Git..b, including viewing issues, submitting bug reports, and sending patches and pull requests.

The main distinction from Git..b here is that these web clients — let's call them “community bridges” — would merely be rich API translation endpoints contributing to fully distributed projects without locking developers into any walled gardens. They would be enabling collaboration without introducing central dependencies and points of failure and anyone choosing to participate on their own terms using their own free software stack (e.g. with our fictional decent tool) would be fully empowered to do so. In fact, Git..b and others can, too, become community bridges and allow their clients to participate in distributed projects for a truly “cross-garden” development experience.

(The web bridge would necessarily need to manage the contributor's identity to create their sigchain feed, but for “drive-by” contributions this would be a reasonable trade-off. Anyone can decide to switch to a local client and start a new identity at any time if they feel they can no longer trust the bridge they are using.)

I'm intrigued. What happens next?

I've been mulling this over for a while now, and this is really just the first brain dump of my thoughts. At this point, I need everyone to point out all the possible ways why this wouldn't work (much more appreciated if it's followed by a “but it might work if...”). I realize that there is no way to leave a comment here, but you can reach out to me via either floss.social or by emailing me at mricon@kernel.org.

The Linux development community has already given us a powerful distributed development tool in the form of git, and I firmly believe that it is able to deliver a git satellite tool that would encompass all aspects of project development beyond just code. I hope that by outlining my thoughts here I'll be able to jumpstart the necessary discussion that would eventually get us there.

PS: The title of this post references a talk by Greg KH titled Patches carved into stone tablets that goes into details on why kernel developers still use mailing lists for everything.

 
Read more...

from Konstantin Ryabitsev

The mail archiving system at lore.kernel.org uses public-inbox, which relies on git as the mechanism to store messages. This makes the entire archive collection very easy to replicate using grokmirror — the same tool we use to mirror git.kernel.org repositories across multiple worldwide frontends.

Setting up

It doesn't take a lot to get started. First, install grokmirror either from pip:

pip install grokmirror

or from your distro repositories:

dnf install python3-grokmirror

Next, you will need a config file and a location where you'll store your copy (keep in mind, at the time of writing all of the archives take up upwards of 20GB):

[lore.kernel.org]
# Use the erol mirror instead of lore directly
site = https://erol.kernel.org
manifest = https://erol.kernel.org/manifest.js.gz
toplevel = /path/to/your/local/archive
mymanifest = %(toplevel)s/manifest.js.gz
log = %(toplevel)s/pull.log
pull_threads = 2

Save this file into lore.conf and just run:

grok-pull -v -c lore.conf

The initial clone is going to take a long time, but after it is complete, consecutive runs of grok-pull will only update those repositories that have changed. If new repositories are added, they will be automatically cloned and added to your mirror of the archive.

Note: this by itself is not enough to run public-inbox on your local system, because there's a lot more to public-inbox than just git archives of all messages. For starters, the archives would need to be indexed into a series of sqlite3 and xapian databases, and the end-result would take up a LOT more than 20GB.

Future work

We are hoping to fund the development of a set of tools around public-inbox archives that would allow you to do cool stuff with submitted patches without needing to subscribe to LKML or any other list archived by lore.kernel.org. We expect this would be a nice feature that various CI bots can use to automatically discover and test patches without needing to bother about SMTP and incoming mail processing. If you would like to participate, please feel free to join the public-inbox development list.

 
Read more...