profile
viewpoint
Linus Torvalds torvalds Linux Foundation Portland, OR

torvalds/linux 89016

Linux kernel source tree

Subsurface-divelog/subsurface 1694

This is the official upstream of the Subsurface divelog program

fusijie/Cocos-Resource 1158

:books:Cocos 资料大全(全版本)

nabilfreeman/ios-universal-webview-boilerplate 799

Universal Swift-based boilerplate for a web app.

torvalds/uemacs 510

Random version of microemacs with my private modificatons

NikolayIT/OpenJudgeSystem 373

An open source system for online algorithm competitions for Windows, written in ASP.NET MVC

torvalds/test-tlb 256

Stupid memory latency and TLB tester

NikolayIT/RatioMaster.NET 153

Ratiomaster.NET is a small standalone application which fakes upload and download stats of a torrent to almost all bittorrent trackers. This means that it does NOT rely on your bittorrent client (uTorrent, Azureus, BitComet, ABC and etc.) and it will NOT download/upload the files on a torrent - it only can fake download/upload. RatioMaster.NET has hardcoded emulations for the most commonly used BitTorrent clients: uTorrent, BitComet, Azureus, ABC, BitLord, BTuga, BitTornado, Burst, BitTyrant, BitSpirit.

tuna/danmaQ 150

danmaku implemented in Qt5

torvalds/pesconvert 131

Brother PES file converter

push eventtorvalds/linux

Guo Ren

commit sha 12879bda3c2a974b7e4fe199a9c21f0c5f6bca04

csky: Fixup init_fpu compile warning with __init WARNING: vmlinux.o(.text+0x2366): Section mismatch in reference from the function csky_start_secondary() to the function .init.text:init_fpu() The function csky_start_secondary() references the function __init init_fpu(). This is often because csky_start_secondary lacks a __init annotation or the annotation of init_fpu is wrong. Reported-by: Lu Chongzhi <chongzhi.lcz@alibaba-inc.com> Signed-off-by: Guo Ren <guoren@linux.alibaba.com>

view details

Guo Ren

commit sha bfe47f358ad298a1efb9b8f8299a81541d90df87

csky: Implement ptrace regs and stack API Needed for kprobes support. Copied and adapted from Patrick's patch, but it has been modified for csky's pt_regs. ref: https://lore.kernel.org/linux-riscv/1572919114-3886-2-git-send-email-vincent.chen@sifive.com/raw Signed-off-by: Guo Ren <ren_guo@c-sky.com> Cc: Patrick Staehlin <me@packi.ch>

view details

Guo Ren

commit sha 9866d141a0977ace974400bf1f793dfc163409ce

csky: Add support for restartable sequence Copied and adapted from vincent's patch, but modified for csky. ref: https://lore.kernel.org/linux-riscv/1572919114-3886-3-git-send-email-vincent.chen@sifive.com/raw Add calls to rseq_signal_deliver(), rseq_handle_notify_resume() and rseq_syscall() to introduce RSEQ support. 1. Call the rseq_handle_notify_resume() function on return to userspace if TIF_NOTIFY_RESUME thread flag is set. 2. Call the rseq_signal_deliver() function to fixup on the pre-signal frame when a signal is delivered on top of a restartable sequence critical section. 3. Check that system calls are not invoked from within rseq critical sections by invoking rseq_signal() from ret_from_syscall(). With CONFIG_DEBUG_RSEQ, such behavior results in termination of the process with SIGSEGV. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>

view details

Guo Ren

commit sha 89a3927a775c0a7212e2e3c4e2d42cd48895bee0

csky: Implement ftrace with regs This patch implements FTRACE_WITH_REGS for csky, which allows a traced function's arguments (and some other registers) to be captured into a struct pt_regs, allowing these to be inspected and/or modified. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>

view details

Rafael J. Wysocki

commit sha a00ec3874e7d326ab2dffbed92faddf6a77a84e9

cpufreq: intel_pstate: Select schedutil as the default governor Modify cpufreq Kconfig to select schedutil as the default governor if the intel_pstate driver has been selected and SMP support is enabled (because schedutil depends on SMP). Also select schedutil as well as the performance governor from the intel_pstate Kconfig section to ensure the equivalence of the passive and active mode governor configuration options. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Liguang Zhang

commit sha b17b80645453f8c3174d5d3b55e590cb6a76ca29

ACPI: CPPC: clean up acpi_get_psd_map() In acpi_get_psd_map() variable all_cpu_data[] can't be NULL and variable match_cpc_ptr has been checked before, no need check again at the end of the funchtion. Some additional optimizations can be made on top of that. Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Bob Moore

commit sha 1c040f3a6a63304676b037c04e32f45c6b648d26

ACPICA: Fix a typo in a comment field ACPICA commit f3504c591c8766c70402dcc786391ff6748b515a Link: https://github.com/acpica/acpica/commit/f3504c59 Reported-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Michał Żygowski

commit sha 1f6239ca16fda8048367d47fc4979cbe8f8c658f

ACPICA: Implement IVRS IVHD type 11h parsing ACPICA commit 6ddc19419896e4149ced1b5f35f0dc12516c0399 The AMD IVRS table parsing supported only IVHD type 10h structures. Parsing an IVHD type 11h caused the iasl to report unknown subtable type. Add necessary structure definition for IVHD type 11h and apply correct parsing method based on subtable type. Link: https://github.com/acpica/acpica/commit/6ddc1941 Signed-off-by: Michał Żygowski <michal.zygowski@3mdeb.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Michał Żygowski

commit sha 0dc7e795204c830812c79795fd17b329d9a8cbab

ACPICA: Fix IVRS IVHD type 10h reserved field name ACPICA commit 87a1ab2b2a63e28776261c48bdbae345f790d05d According to AMD IOMMU Specification Revision 3.05 the reserved field should be IOMMU Feature Reporting. Change the name of the field to the correct one. Link: https://github.com/acpica/acpica/commit/87a1ab2b Signed-off-by: Michał Żygowski <michal.zygowski@3mdeb.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Erik Kaneda

commit sha 26b22d7999d9043c4039de0713ffa2a983cd7226

ACPICA: Change PlatformCommChannel ASL keyword to PCC ACPICA commit 811e69a59cb4189ebf8b882eba74c881f598a239 The former was proposed during specification discussions but it was dropped. This keyword was introduced to the ACPICA code base by mistake so this commit changes the keyword representing Platform Communication Channel to be PCC. Link: https://github.com/acpica/acpica/commit/811e69a5 Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Erik Kaneda

commit sha f2173c3e7df2cc33d15a6cfd2fbf78a4bca63a4e

ACPICA: acpiexec: remove redeclaration of acpi_gbl_db_opt_no_region_support ACPICA commit 825c53661cacc7e3dab4844588201846143bd1b7 This variable was re-defined in a file specific to acpiexec. Remove the redundant declaration and move the initialize to the debugger. Patch based on suggestions by David Seifert and Benjamin Berg. Link: https://github.com/acpica/acpica/commit/825c5366 Reported-by: David Seifert <soap@gentoo.org> Reported-by: Benjamin Berg <bberg@redhat.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

John Levon

commit sha bb89a79ac6a605c5f02eced174e986e1d8b4cd45

ACPICA: utilities: fix sprintf() This contains changes for the following ACPICA commit ID's: 8f99a6ccd3b8e5c3d3d68c53fdbb054c2477eeb4 d30647af53abd334cbcf6362387464ea647bac9e d3c5fb4cf5b2880d789c987eb847fc3de3774abc On 32-bit, the provided sprintf() is non-functional: with a size of ACPI_UINT32_MAX, String + Size will wrap, meaning End < Start, and acpi_ut_bound_string_output() will never output anything as a result. The symptom we saw of this was acpixtract failing to output anything. Link: https://github.com/acpica/acpica/commit/8f99a6cc Link: https://github.com/acpica/acpica/commit/d30647af Link: https://github.com/acpica/acpica/commit/d3c5fb4c Signed-off-by: MSathieu <18145111+MSathieu@users.noreply.github.com> Signed-off-by: John Levon <john.levon@joyent.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Erik Kaneda

commit sha afb908708b39845837bbc13026bad5af9b8e5b83

ACPICA: WSMT: Fix typo, no functional change ACPICA commit 764d18c5a83949ff3b0dbda6055cee1929b9caa2 The table signature WSMT stands for "Windows SMM Mitigations Table". It is not "Windows SMM Migrations Table". Link: https://github.com/acpica/acpica/commit/764d18c5 Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Cezary Rojewski

commit sha 88055d8f4a715d52b9b981f717e9e51cf4668d5d

ACPICA: Add NHLT table signature ACPICA commit 422166b656565d180bb3aac712009bdce5e70cdd NHLT (Non-HDAudio Link Table) provides configuration of audio endpoints for Intel SST (Smart Sound Technology) DSP products. Similarly to other ACPI tables, data provided by BIOS may not describe it correctly, thus overriding is required. ACPI override mechanism checks for unknown signature before proceeding. Update known signatures array to support NHLT. Link: https://github.com/acpica/acpica/commit/422166b6 Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Bob Moore

commit sha 9a1ae80412dcaa67a29eecf19de44f32b5f1c357

ACPICA: Fixes for acpiExec namespace init file This is the result of squashing the following ACPICA commit ID's: 6803997e5b4f3635cea6610b51ff69e29d251de3 f31cdf8bfda22fe265c1a176d0e33d311c82a7f7 This change fixes several problems with the support for the acpi_exec namespace init file (-fi option). Specifically, it fixes AE_ALREADY_EXISTS errors, as well as various seg faults. Link: https://github.com/acpica/acpica/commit/f31cdf8b Link: https://github.com/acpica/acpica/commit/6803997e Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Bob Moore

commit sha 6461e59cc2bc3b40da768159a1e64c01401cf336

ACPICA: Update version 20200326 ACPICA commit 994fe943d93fe18eaaed1b6cd725023399f8a5c6 Version 20200326. Link: https://github.com/acpica/acpica/commit/994fe943 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Erik Kaneda <erik.kaneda@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

Guo Ren

commit sha dd7c983e78a28ff0b22f8bcf32a303b4f79cb318

csky/ftrace: Fixup ftrace_modify_code deadlock without CPU_HAS_ICACHE_INS If ICACHE_INS is not supported, we use IPI to sync icache on each core. But ftrace_modify_code is called from stop_machine from default implementation of arch_ftrace_update_code and stop_machine callback is irq_disabled. When you call ipi with irq_disabled, a deadlock will happen. We couldn't use icache_flush with irq_disabled, but startup make_nop is specific case and it needn't ipi other cores. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>

view details

Guo Ren

commit sha 9c0e343d7654a329d1f9b53d253cbf7fb6eff85d

csky: Fixup get wrong psr value from phyical reg We should get psr value from regs->psr in stack, not directly get it from phyiscal register then save the vector number in tsk->trap_no. Signed-off-by: Guo Ren <guoren@linux.alibaba.com>

view details

Ma Jun

commit sha de8636787119193ea1a4b7c1a13be0de5b64210f

csky: Enable the gcov function Support the gcov function in csky architecture. Signed-off-by: Ma Jun <majun258@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org>

view details

Randy Dunlap

commit sha 5fd769c2bf115c23670acbc54875717b177c7e0e

ACPI: video: Docs update for "acpi_backlight" kernel parameter options Update the Documentation for "acpi_backlight" by adding 2 new options (native and none). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

view details

push time in 7 hours

push eventtorvalds/linux

Jan Kara

commit sha bc36dfffd5f3f19edcf85954d93eb0bc45875c37

ext2: Silence lockdep warning about reclaim under xattr_sem Lockdep complains about a chain: sb_internal#2 --> &ei->xattr_sem#2 --> fs_reclaim and shrink_dentry_list -> ext2_evict_inode -> ext2_xattr_delete_inode -> down_write(ei->xattr_sem) creating a locking cycle in the reclaim path. This is however a false positive because when we are in ext2_evict_inode() we are the only holder of the inode reference and nobody else should touch xattr_sem of that inode. So we cannot ever block on acquiring the xattr_sem in the reclaim path. Silence the lockdep warning by using down_write_trylock() in ext2_xattr_delete_inode() to not create false locking dependency. Reported-by: "J. R. Okajima" <hooanon05g@gmail.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Gustavo A. R. Silva

commit sha c2d0699c629d30dde3329003baac3b94f1d717e6

ext2: xattr.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Link: https://lore.kernel.org/r/20200309180441.GA2992@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Gustavo A. R. Silva

commit sha 3fc131663cec7ca8e05bf25d99cecc2ab56bc50d

udf: udf_sb.h: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Link: https://lore.kernel.org/r/20200309202715.GA9428@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Jan Kara

commit sha 32302085a8d90859c40cf1a5e8313f575d06ec75

ext2: fix debug reference to ext2_xattr_cache Fix a debug-only build error in ext2/xattr.c: When building without extra debugging, (and with another patch that uses no_printk() instead of <empty> for the ext2-xattr debug-print macros, this build error happens: ../fs/ext2/xattr.c: In function ‘ext2_xattr_cache_insert’: ../fs/ext2/xattr.c:869:18: error: ‘ext2_xattr_cache’ undeclared (first use in this function); did you mean ‘ext2_xattr_list’? atomic_read(&ext2_xattr_cache->c_entry_count)); Fix the problem by removing cached entry count from the debug message since otherwise we'd have to export the mbcache structure just for that. Fixes: be0726d33cb8 ("ext2: convert to mbcache2") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Randy Dunlap

commit sha 44a52022e7f15cbaab957df1c14f7a4f527ef7cf

ext2: fix empty body warnings when -Wextra is used When EXT2_ATTR_DEBUG is not defined, modify the 2 debug macros to use the no_printk() macro instead of <nothing>. This fixes gcc warnings when -Wextra is used: ../fs/ext2/xattr.c:252:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/ext2/xattr.c:258:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/ext2/xattr.c:330:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/ext2/xattr.c:872:45: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body] I have verified that the only object code change (with gcc 7.5.0) is the reversal of some instructions from 'cmp a,b' to 'cmp b,a'. Link: https://lore.kernel.org/r/e18a7395-61fb-2093-18e8-ed4f8cf56248@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jan Kara <jack@suse.com> Cc: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha 6473ea760ca1707ade74de5b57e74189d14f8e10

fsnotify: tidy up FS_ and FAN_ constants Order by value, so the free value ranges are easier to find. Link: https://lore.kernel.org/r/20200319151022.31456-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha eae36a2b8324c9dd0a66b2ae32abd4a456e49c39

fsnotify: factor helpers fsnotify_dentry() and fsnotify_file() Most of the code in fsnotify hooks is boiler plate of one or the other. Link: https://lore.kernel.org/r/20200319151022.31456-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha a1aae0570a2b806937120921db2c5d3100ca55fc

fsnotify: funnel all dirent events through fsnotify_name() Factor out fsnotify_name() from fsnotify_dirent(), so it can also serve link and rename events and use this helper to report all directory entry change events. Both helpers return void because no caller checks their return value. Link: https://lore.kernel.org/r/20200319151022.31456-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha aa93bdc5500cc93ba31afeda1a61610d117947ad

fsnotify: use helpers to access data by data_type Create helpers to access path and inode from different data types. Link: https://lore.kernel.org/r/20200319151022.31456-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha 017de65fe58f2b0ca428b5830609520ded5898b9

fsnotify: simplify arguments passing to fsnotify_parent() Instead of passing both dentry and path and having to figure out which one to use, pass data/data_type to simplify the code. Link: https://lore.kernel.org/r/20200319151022.31456-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha dfc2d2594e4a79204a3967585245f00644b8f838

fsnotify: replace inode pointer with an object id The event inode field is used only for comparison in queue merges and cannot be dereferenced after handle_event(), because it does not hold a refcount on the inode. Replace it with an abstract id to do the same thing. Link: https://lore.kernel.org/r/20200319151022.31456-8-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha f367a62a7cad2447d835a9f14fc63997a9137246

fanotify: merge duplicate events on parent and child With inotify, when a watch is set on a directory and on its child, an event on the child is reported twice, once with wd of the parent watch and once with wd of the child watch without the filename. With fanotify, when a watch is set on a directory and on its child, an event on the child is reported twice, but it has the exact same information - either an open file descriptor of the child or an encoded fid of the child. The reason that the two identical events are not merged is because the object id used for merging events in the queue is the child inode in one event and parent inode in the other. For events with path or dentry data, use the victim inode instead of the watched inode as the object id for event merging, so that the event reported on parent will be merged with the event reported on the child. Link: https://lore.kernel.org/r/20200319151022.31456-9-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha 55bf882c7f13dda8bbe624040c6d5b4fbb812d16

fanotify: fix merging marks masks with FAN_ONDIR Change the logic of FAN_ONDIR in two ways that are similar to the logic of FAN_EVENT_ON_CHILD, that was fixed in commit 54a307ba8d3c ("fanotify: fix logic of events on child"): 1. The flag is meaningless in ignore mask 2. The flag refers only to events in the mask of the mark where it is set This is what the fanotify_mark.2 man page says about FAN_ONDIR: "Without this flag, only events for files are created." It doesn't say anything about setting this flag in ignore mask to stop getting events on directories nor can I think of any setup where this capability would be useful. Currently, when marks masks are merged, the FAN_ONDIR flag set in one mark affects the events that are set in another mark's mask and this behavior causes unexpected results. For example, a user adds a mark on a directory with mask FAN_ATTRIB | FAN_ONDIR and a mount mark with mask FAN_OPEN (without FAN_ONDIR). An opendir() of that directory (which is inside that mount) generates a FAN_OPEN event even though neither of the marks requested to get open events on directories. Link: https://lore.kernel.org/r/20200319151022.31456-10-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Jan Kara

commit sha a741c2febeadc675aef84bcd6924b6522577d593

fanotify: Simplify create_fd() create_fd() is never used with invalid path. Also the only thing it needs to know from fanotify_event is the path. Simplify the function to take path directly and assume it is correct. Signed-off-by: Jan Kara <jack@suse.cz>

view details

Jan Kara

commit sha afc894c784c84cb3bb85a235feca2cb278f7b023

fanotify: Store fanotify handles differently Currently, struct fanotify_fid groups fsid and file handle and is unioned together with struct path to save space. Also there is fh_type and fh_len directly in struct fanotify_event to avoid padding overhead. In the follwing patches, we will be adding more event types and this packing makes code difficult to follow. So unpack everything and create struct fanotify_fh which groups members logically related to file handle to make code easier to follow. In the following patch we will pack things again differently to make events smaller. Signed-off-by: Jan Kara <jack@suse.cz>

view details

Jan Kara

commit sha 7088f35720a55b99624ea36091538baec7ec611f

fanotify: divorce fanotify_path_event and fanotify_fid_event Breakup the union and make them both inherit from abstract fanotify_event. fanotify_path_event, fanotify_fid_event and fanotify_perm_event inherit from fanotify_event. type field in abstract fanotify_event determines the concrete event type. fanotify_path_event, fanotify_fid_event and fanotify_perm_event are allocated from separate memcache pools. Rename fanotify_perm_event casting macro to FANOTIFY_PERM(), so that FANOTIFY_PE() and FANOTIFY_FE() can be used as casting macros to fanotify_path_event and fanotify_fid_event. [JK: Cleanup FANOTIFY_PE() and FANOTIFY_FE() to be proper inline functions and remove requirement that fanotify_event is the first in event structures] Link: https://lore.kernel.org/r/20200319151022.31456-11-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha 9e2ba2c34f1922ca1e0c7d31b30ace5842c2e7d1

fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name Dirent events are going to be supported in two flavors: 1. Directory fid info + mask that includes the specific event types (e.g. FAN_CREATE) and an optional FAN_ONDIR flag. 2. Directory fid info + name + mask that includes only FAN_DIR_MODIFY. To request the second event flavor, user needs to set the event type FAN_DIR_MODIFY in the mark mask. The first flavor is supported since kernel v5.1 for groups initialized with flag FAN_REPORT_FID. It is intended to be used for watching directories in "batch mode" - the watcher is notified when directory is changed and re-scans the directory content in response. This event flavor is stored more compactly in the event queue, so it is optimal for workloads with frequent directory changes. The second event flavor is intended to be used for watching large directories, where the cost of re-scan of the directory on every change is considered too high. The watcher getting the event with the directory fid and entry name is expected to call fstatat(2) to query the content of the entry after the change. Legacy inotify events are reported with name and event mask (e.g. "foo", FAN_CREATE | FAN_ONDIR). That can lead users to the conclusion that there is *currently* an entry "foo" that is a sub-directory, when in fact "foo" may be negative or non-dir by the time user gets the event. To make it clear that the current state of the named entry is unknown, when reporting an event with name info, fanotify obfuscates the specific event types (e.g. create,delete,rename) and uses a common event type - FAN_DIR_MODIFY to describe the change. This should make it harder for users to make wrong assumptions and write buggy filesystem monitors. At this point, name info reporting is not yet implemented, so trying to set FAN_DIR_MODIFY in mark mask will return -EINVAL. Link: https://lore.kernel.org/r/20200319151022.31456-12-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha d766b553615ce67f33848ec4f6672b3158198a5c

fanotify: prepare to report both parent and child fid's For some events, we are going to report both child and parent fid's, so pass fsid and file handle as arguments to copy_fid_to_user(), which is going to be called with parent and child file handles. Link: https://lore.kernel.org/r/20200319151022.31456-13-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

Jan Kara

commit sha 01affd5471dcab04c6cb0c2acaf132a20488f86f

fanotify: Drop fanotify_event_has_fid() When some events have directory id and some object id, fanotify_event_has_fid() becomes mostly useless and confusing because we usually need to know which type of file handle the event has. So just drop the function and use fanotify_event_object_fh() instead. Signed-off-by: Jan Kara <jack@suse.cz>

view details

Amir Goldstein

commit sha cacfb956d46edc5d7a1165161bfc6ca3de9519d9

fanotify: record name info for FAN_DIR_MODIFY event For FAN_DIR_MODIFY event, allocate a variable size event struct to store the dir entry name along side the directory file handle. At this point, name info reporting is not yet implemented, so trying to set FAN_DIR_MODIFY in mark mask will return -EINVAL. Link: https://lore.kernel.org/r/20200319151022.31456-14-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>

view details

push time in 8 hours

push eventtorvalds/linux

Paul Cercueil

commit sha 5b11e5d784c2dab41cd3171f4d297bdcac1b85ed

power/supply: ingenic-battery: Don't print error on -EPROBE_DEFER Don't print an error message if devm_power_supply_register() returns -EPROBE_DEFER, since the driver will simply re-probe later. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Baolin Wang

commit sha 1c5dfc5e3f2df5892a849621d729daa87cfef436

power: supply: sc27xx: Add POWER_SUPPLY_PROP_CHARGE_NOW attribute Add the POWER_SUPPLY_PROP_CHARGE_NOW attribute to allow user to get current battery capacity (uAh) to do measurement. Signed-off-by: Baolin Wang <baolin.wang7@gmail.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Baolin Wang

commit sha 241eaabc3c315cdfea505725a43de848f498527f

power: supply: Allow charger manager can be built as a module Allow charger manager can be built as a module like other charger drivers. Signed-off-by: Baolin Wang <baolin.wang7@gmail.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Ashish Chavan

commit sha ddb74e985f2d3db1ef5688121c8b1961f0c9d80d

power: supply: ab8500_charger: Fix typos in commit messages Trivial fix to spelling mistake in commit messages. Signed-off-by: Ashish Chavan <ashish.gschavan@gmail.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Jeffery Miller

commit sha e42fe5b29ac07210297e75f36deefe54edbdbf80

power: supply: axp288_fuel_gauge: Broaden vendor check for Intel Compute Sticks. The Intel Compute Stick `STK1A32SC` can have a system vendor of "Intel(R) Client Systems". Broaden the Intel Compute Stick DMI checks so that they match "Intel Corporation" as well as "Intel(R) Client Systems". This fixes an issue where the STK1A32SC compute sticks were still exposing a battery with the existing blacklist entry. Signed-off-by: Jeffery Miller <jmiller@neverware.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Hans de Goede

commit sha 9c80662a74cd2a5d1113f5c69d027face963a556

power: supply: axp288_charger: Add special handling for HP Pavilion x2 10 Some HP Pavilion x2 10 models use an AXP288 for charging and fuel-gauge. We use a native power_supply / PMIC driver in this case, because on most models with an AXP288 the ACPI AC / Battery code is either completely missing or relies on custom / proprietary ACPI OpRegions which Linux does not implement. The native drivers mostly work fine, but there are 2 problems: 1. These model uses a Type-C connector for charging which the AXP288 does not support. As long as a Type-A charger (which uses the USB data pins for charger type detection) is used everything is fine. But if a Type-C charger is used (such as the charger shipped with the device) then the charger is not recognized. So we end up slowly discharging the device even though a charger is connected, because we are limiting the current from the charger to 500mA. To make things worse this happens with the device's official charger. Looking at the ACPI tables HP has "solved" the problem of the AXP288 not being able to recognize Type-C chargers by simply always programming the input-current-limit at 3000mA and relying on a Vhold setting of 4.7V (normally 4.4V) to limit the current intake if the charger cannot handle this. 2. If no charger is connected when the machine boots then it boots with the vbus-path disabled. On other devices this is done when a 5V boost converter is active to avoid the PMIC trying to charge from the 5V boost output. This is done when an OTG host cable is inserted and the ID pin on the micro-B receptacle is pulled low, the ID pin has an ACPI event handler associated with it which re-enables the vbus-path when the ID pin is pulled high when the OTG cable is removed. The Type-C connector has no ID pin, there is no ID pin handler and there appears to be no 5V boost converter, so we end up not charging because the vbus-path is disabled, until we unplug the charger which automatically clears the vbus-path disable bit and then on the second plug-in of the adapter we start charging. The HP Pavilion x2 10 models with an AXP288 do have mostly working ACPI AC / Battery code which does not rely on custom / proprietary ACPI OpRegions. So one possible solution would be to blacklist the AXP288 native power_supply drivers and add the HP Pavilion x2 10 with AXP288 DMI ids to the list of devices which should use the ACPI AC / Battery code even though they have an AXP288 PMIC. This would require changes to 4 files: drivers/acpi/ac.c, drivers/power/supply/axp288_charger.c, drivers/acpi/battery.c and drivers/power/supply/axp288_fuel_gauge.c. Beside needing adding the same DMI matches to 4 different files, this approach also triggers problem 2. from above, but then when suspended, during suspend the machine will not wakeup because the vbus path is disabled by the AML code when not charging, so the Vbus low-to-high IRQ is not triggered, the CPU never wakes up and the device does not charge even though the user likely things it is charging, esp. since the charge status LED is directly coupled to an adapter being plugged in and does not reflect actual charging. This could be worked by enabling vbus-path explicitly from say the axp288_charger driver's suspend handler. So neither situation is ideal, in both cased we need to explicitly enable the vbus-path to work around different variants of problem 2 above, this requires a quirk in the axp288_charger code. If we go the route of using the ACPI AC / Battery drivers then we need modifications to 3 other drivers; and we need to partially disable the axp288_charger code, while at the same time keeping it around to enable vbus-path on suspend. OTOH we can copy the hardcoding of 3A input-current-limit (we never touch Vhold, so that would stay at 4.7V) to the axp288_charger code, which needs changes regardless, then we concentrate all special handling of this interesting device model in the axp288_charger code. That is what this commit does. Cc: stable@vger.kernel.org BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1791098 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha b2a16610f2ba72468b4a7ac1e462af1e9c70bae8

power: reset: at91-reset: introduce struct at91_reset Introduce struct at91_reset intended to keep all the at91 reset controller data. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha 4d9ce0f56aeeb46c8be0e8974646f163d3f6b72d

power: reset: at91-reset: add ramc_base[] to struct at91_reset Add ramc_base[] to struct at91_reset. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha f9e6ce74cbf2a34f4d37bccbbfc7865e5b3e01dd

power: reset: at91-reset: add sclk to struct at91_reset Add sclk to struct at91_reset. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha 1e3c4af9de26a0246cf00aba207c49846e80c37b

power: reset: at91-reset: add notifier block to struct at91_reset Add struct notifier_block to struct at91_reset. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha b7967b7919f0e3787d6e23424b5ad367fbb937e2

power: reset: at91-reset: convert reset in pointer to struct at91_reset Convert reset in pointer to struct at91_reset. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha 55f8e6fdefbe926e1729bba3a7608eed7d7f24f5

power: reset: at91-reset: pass rstc base address to at91_reset_status() Add new argument to at91_reset_status() that is the pointer to reset controller base address. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha 583ef884c8dc8e5ec9a7fde12d40ef7fdf18f5fb

power: reset: at91-reset: devm_kzalloc() for at91_reset data structure Allocate at91_reset data on probe and set it as platform data. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha a5bbad258a9ec41df4a158c828b9ef0af7955854

power: reset: at91-reset: introduce struct at91_reset_data Introduce struct at91_reset_data to be able to provide per SoC data. At the moment this being only notifier callback. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha 25b80b7d5a5b41cb52db441e37a04d71e7196f60

power: reset: at91-reset: introduce args member in at91_reset_data Introduce args member in struct at91_reset_data. It stores the value that needs to be written in mode register so that the reboot actions to happen. With these changes samx7_restart() could be removed. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha 7cb290d3dd559dff5028b9418a31ecb99988f640

power: reset: at91-reset: use r4 as tmp argument Use r4 as temporary register. On ARM r0-r3 should be used to hold function arguments. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha 68a84a3e68a2238036dd6c7dbb677fe62a92d70a

power: reset: at91-reset: introduce ramc_lpr to struct at91_reset Introduce ramc_lpr to struct at91_reset. This will lead to the unification of at91sam9260_restart() and at91sam9g45_restart(). Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha fcd0532fac2ac82cec6d7b43298c8ecfc93e1049

power: reset: at91-reset: make at91sam9g45_restart() generic Make at91sam9g45_restart() generic. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha 51aa7d45f905ca03d3c61d32a09537903a7ea707

power: reset: at91-reset: keep only one reset function Keep only one reset function. With this, notifier_call member of struct at91_reset_data could be removed. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

Claudiu.Beznea@microchip.com

commit sha 766b0162e613a89a330382e566e29da57f5f0de5

power: reset: at91-reset: get rid of at91_reset_data After refactoring struct at91_reset_data and struct at91_reset_data at91sam9260_reset_data are not needed anymore. Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>

view details

push time in a day

push eventtorvalds/linux

Mike Looijmans

commit sha 692751879ea876dec81357e6f1f8ca0ced131ceb

clk, clk-si5341: Support multiple input ports The Si5341 and Si5340 have multiple input clock options. So far, the driver only supported the XTAL input, this adds support for the three external clock inputs as well. If the clock chip isn't programmed at boot, the driver will default to the XTAL input as before. If there is no "xtal" clock input available, it will pick the first connected input (e.g. "in0") as the input clock for the PLL. One can use clock-assigned-parents to select a particular clock as input. Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl> Link: https://lkml.kernel.org/r/20200107075340.14528-1-mike.looijmans@topic.nl Signed-off-by: Stephen Boyd <sboyd@kernel.org>

view details

Geert Uytterhoeven

commit sha 6e26901a00c04436e9e7a323772b71348aeab651

clk: renesas: rcar-gen3: Add CCREE clocks Add the CryptoCell module clocks and their parents for the CryptoCell instances in the various Renesas R-Car Gen3 SoCs that do not have support for them yet in their clock drivers (M3-W/W+, M3-N, E3, D3). The R-Car H3 clock driver already supports these clocks. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20200124133137.15921-1-geert+renesas@glider.be

view details

Dirk Behme

commit sha 9e6f3b44dc75666bd1861f49aea0f8dfa8c67499

clk: renesas: r8a7795: Add RPC clocks Describe the RPCSRC internal clock and the RPC[D2] clocks derived from it, as well as the RPC-IF module clock, in the R-Car H3 (R8A7795) CPG/MSSR driver. Inspired by commit 94e3935b5756 ("clk: renesas: r8a77980: Add RPC clocks"). Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20200203072901.31548-1-dirk.behme@de.bosch.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

view details

Dirk Behme

commit sha 715286f51d13f125de6dbc13f2d707a7893bf95d

clk: renesas: r8a7796: Add RPC clocks Describe the RPCSRC internal clock and the RPC[D2] clocks derived from it, as well as the RPC-IF module clock, in the R-Car M3-W/M3-W+ (R8A7796) CPG/MSSR driver. Inspired by commit 94e3935b5756 ("clk: renesas: r8a77980: Add RPC clocks"). Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20200203072901.31548-2-dirk.behme@de.bosch.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

view details

Dirk Behme

commit sha 808eab15f39be1ed9502244d2aa4f35209f9d0d5

clk: renesas: r8a77965: Add RPC clocks Describe the RPCSRC internal clock and the RPC[D2] clocks derived from it, as well as the RPC-IF module clock, in the R-Car M3-N (R8A77965) CPG/MSSR driver. Inspired by commit 94e3935b5756 ("clk: renesas: r8a77980: Add RPC clocks"). Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Link: https://lore.kernel.org/r/20200203072901.31548-3-dirk.behme@de.bosch.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

view details

Jernej Skrabec

commit sha 1de8493069b8f510e586242c1b4329cd3c9b6fb9

clk: sunxi-ng: a64: Export MBUS clock MBUS clock will be referenced in MBUS controller node. Export it. Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Jernej Skrabec

commit sha 2b48dcb7a821fc38e0be3e171bd02e058196ccf1

clk: sunxi-ng: sun8i-de2: Split out H5 definitions H5 has less clocks and resets than A64. Currently that's not obvious because A64 is missing rotation core related clocks and reset. Split out H5 definition. A64 structures will be fixed in subsequent commit. Note that this patch depends on commit 19368d99746e ("clk: sunxi-ng: add support for Allwinner H3 DE2 CCU") for the H3 clock list. Fixes: 763c5bd045b1 ("clk: sunxi-ng: add support for DE2 CCU") Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Jernej Skrabec

commit sha b4bbce660a3694811b5500c40b9cd69513b50d38

clk: sunxi-ng: sun8i-de2: Add rotation core clocks and reset for A64 A64 has rotation core which needs clocks and reset. Because there is no appropriate structures available, make a separate, A64 specific structures. Fixes: cf4881c12935 ("clk: sunxi-ng: fix the A64/H5 clock description of DE2 CCU") Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Jernej Skrabec

commit sha 75250eb75c828eebda77e23f44de0f037546ea9a

clk: sunxi-ng: sun8i-de2: H6 doesn't have rotate core DE3 documentation regarding presence of rotate core in H6 is a bit confusing. Register descriptions mention bits for enabling rotate core clocks and reset, but general overview doesn't list it as feature of H6 display engine, BSP kernel doesn't support it and there is no interrupt listed for it. Manual poking registers also didn't reveal presence of rotate core. Let's assume there isn't any rotate core on H6 present and remove related clocks. With that done, structures are same as those for H5, so just reuse H5 structure. Fixes: 56808da9f97f ("clk: sunxi-ng: Add support for H6 DE3 clocks") Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Jernej Skrabec

commit sha 8f9b11a33ad6976e17e7279a170864a1fc613803

clk: sunxi-ng: sun8i-de2: Don't reuse A83T resets Currently, V3s and H3 reuse A83T reset structure. However, A83T contains additional core for rotation, which is not present in V3s and H3. Make new reset structure for H3 and let V3s reuse it. A83T reset structure will be amended in subsequent commit. Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Jernej Skrabec

commit sha b0bfba905cf81d540acaac7a4b23459f7df1bb24

clk: sunxi-ng: sun8i-de2: Add rotation core clocks and reset for A83T A83T structures don't have clocks and reset for rotation core. Add them. Fixes: 763c5bd045b1 ("clk: sunxi-ng: add support for DE2 CCU") Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Jernej Skrabec

commit sha 11d0c436ffed1dba397e9adec9eaebfe7eef2d10

clk: sunxi-ng: sun8i-de2: Add R40 specific quirks R40 is actually very similar to A64, but it doesn't have mixer1 reset. This means it's clocks and resets combination is unique and R40 specific quirks are needed. Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Jernej Skrabec

commit sha b998b75f8603c37c8a43a3d849cd2c4929adf402

clk: sunxi-ng: sun8i-de2: Sort structures V3s quirks are not in right place. Move it. Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Taniya Das

commit sha 04ac0ad7e8edc7f1a7b3e219db245060fcaf59a4

dt-bindings: clk: qcom: Add support for GPU GX GDSCR In the cases where the GPU SW requires to use the GX GDSCR add support for the same. Signed-off-by: Taniya Das <tdas@codeaurora.org> Link: https://lkml.kernel.org/r/1581307266-26989-1-git-send-email-tdas@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>

view details

Taniya Das

commit sha 1a6151128c847c473da1c7e7022ef78d3e2d6689

clk: qcom: gpucc: Add support for GX GDSC for SC7180 Most of the time the CPU should not be touching the GX domain on the GPU except for a very special use case when the CPU needs to force the GX headswitch off. Add the GX domain for that use case. As part of this add a dummy enable function for the GX gdsc to simulate success so that the pm_runtime reference counting is correct. This matches what was done in sdm845 in commit 85a3d920d30a ("clk: qcom: Add a dummy enable function for GX gdsc"). Signed-off-by: Taniya Das <tdas@codeaurora.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Link: https://lkml.kernel.org/r/1581307266-26989-2-git-send-email-tdas@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>

view details

Stephen Boyd

commit sha f78f29079327a2f0052096f65a06f66838e82b70

clk: qcom: alpha-pll: Make error prints more informative I recently ran across this printk error message spewing in my logs Call set rate on the PLL with rounded rates! and I had no idea what clk that was or what rate was failing to round properly. Make the printk more informative by telling us what went wrong and also add the name of the clk that's failing to change rate. Furthermore, update the other printks in this file with the clk name each time so we know what clk we're talking about. Cc: Taniya Das <tdas@codeaurora.org> Cc: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Link: https://lkml.kernel.org/r/20200205065421.9426-1-swboyd@chromium.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>

view details

Taniya Das

commit sha fdd373a4e0c859c64149aaacd082b6f4e58a6489

dt-bindings: clock: Add RPMHCC bindings for SM8250 Add bindings and update documentation for clock rpmh driver on SM8250. Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Vinod Koul <vkoul@kernel.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Taniya Das <tdas@codeaurora.org> Signed-off-by: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org> Link: https://lkml.kernel.org/r/1579905147-12142-2-git-send-email-vnkgutta@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>

view details

Taniya Das

commit sha 29093b1a5833c24c692abb1370de547d2a60e6be

clk: qcom: rpmh: Add support for RPMH clocks on SM8250 Add support for RPMH clocks on SM8250. Reviewed-by: Vinod Koul <vkoul@kernel.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Taniya Das <tdas@codeaurora.org> Signed-off-by: Venkata Narendra Kumar Gutta <vnkgutta@codeaurora.org> Link: https://lkml.kernel.org/r/1579905147-12142-3-git-send-email-vnkgutta@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>

view details

Stephen Boyd

commit sha f21cf9c77ee82ef8adfeb2143adfacf21ec1d5cc

clk: Don't cache errors from clk_ops::get_phase() We don't check for errors from clk_ops::get_phase() before storing away the result into the clk_core::phase member. This can lead to some fairly confusing debugfs information if these ops do return an error. Let's skip the store when this op fails to fix this. While we're here, move the locking outside of clk_core_get_phase() to simplify callers from the debugfs side. Cc: Douglas Anderson <dianders@chromium.org> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20200205232802.29184-2-sboyd@kernel.org Acked-by: Jerome Brunet <jbrunet@baylibre.com>

view details

Stephen Boyd

commit sha 768a5d4f63c29d3bed5abb3c187312fcf623fa05

clk: Use 'parent' to shorten lines in __clk_core_init() Some lines are getting long in this function. Let's move 'parent' up to the top of the function and use it in many places whenever there is a parent for a clk. This shortens some lines by avoiding core->parent-> indirections. Cc: Douglas Anderson <dianders@chromium.org> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20200205232802.29184-3-sboyd@kernel.org Acked-by: Jerome Brunet <jbrunet@baylibre.com>

view details

push time in a day

push eventtorvalds/linux

Waiman Long

commit sha d3ec10aa95819bff18a0d936b18884c7816d0914

KEYS: Don't write out to userspace while holding key semaphore A lockdep circular locking dependency report was seen when running a keyutils test: [12537.027242] ====================================================== [12537.059309] WARNING: possible circular locking dependency detected [12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - - [12537.125253] ------------------------------------------------------ [12537.153189] keyctl/25598 is trying to acquire lock: [12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0 [12537.208365] [12537.208365] but task is already holding lock: [12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220 [12537.270476] [12537.270476] which lock already depends on the new lock. [12537.270476] [12537.307209] [12537.307209] the existing dependency chain (in reverse order) is: [12537.340754] [12537.340754] -> #3 (&type->lock_class){++++}: [12537.367434] down_write+0x4d/0x110 [12537.385202] __key_link_begin+0x87/0x280 [12537.405232] request_key_and_link+0x483/0xf70 [12537.427221] request_key+0x3c/0x80 [12537.444839] dns_query+0x1db/0x5a5 [dns_resolver] [12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs] [12537.496731] cifs_reconnect+0xe04/0x2500 [cifs] [12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs] [12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs] [12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs] [12537.601045] kthread+0x30c/0x3d0 [12537.617906] ret_from_fork+0x3a/0x50 [12537.636225] [12537.636225] -> #2 (root_key_user.cons_lock){+.+.}: [12537.664525] __mutex_lock+0x105/0x11f0 [12537.683734] request_key_and_link+0x35a/0xf70 [12537.705640] request_key+0x3c/0x80 [12537.723304] dns_query+0x1db/0x5a5 [dns_resolver] [12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs] [12537.775607] cifs_reconnect+0xe04/0x2500 [cifs] [12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs] [12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs] [12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs] [12537.873477] kthread+0x30c/0x3d0 [12537.890281] ret_from_fork+0x3a/0x50 [12537.908649] [12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}: [12537.935225] __mutex_lock+0x105/0x11f0 [12537.954450] cifs_call_async+0x102/0x7f0 [cifs] [12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs] [12538.000659] cifs_readpages+0x120a/0x1e50 [cifs] [12538.023920] read_pages+0xf5/0x560 [12538.041583] __do_page_cache_readahead+0x41d/0x4b0 [12538.067047] ondemand_readahead+0x44c/0xc10 [12538.092069] filemap_fault+0xec1/0x1830 [12538.111637] __do_fault+0x82/0x260 [12538.129216] do_fault+0x419/0xfb0 [12538.146390] __handle_mm_fault+0x862/0xdf0 [12538.167408] handle_mm_fault+0x154/0x550 [12538.187401] __do_page_fault+0x42f/0xa60 [12538.207395] do_page_fault+0x38/0x5e0 [12538.225777] page_fault+0x1e/0x30 [12538.243010] [12538.243010] -> #0 (&mm->mmap_sem){++++}: [12538.267875] lock_acquire+0x14c/0x420 [12538.286848] __might_fault+0x119/0x1b0 [12538.306006] keyring_read_iterator+0x7e/0x170 [12538.327936] assoc_array_subtree_iterate+0x97/0x280 [12538.352154] keyring_read+0xe9/0x110 [12538.370558] keyctl_read_key+0x1b9/0x220 [12538.391470] do_syscall_64+0xa5/0x4b0 [12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf [12538.435535] [12538.435535] other info that might help us debug this: [12538.435535] [12538.472829] Chain exists of: [12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class [12538.472829] [12538.524820] Possible unsafe locking scenario: [12538.524820] [12538.551431] CPU0 CPU1 [12538.572654] ---- ---- [12538.595865] lock(&type->lock_class); [12538.613737] lock(root_key_user.cons_lock); [12538.644234] lock(&type->lock_class); [12538.672410] lock(&mm->mmap_sem); [12538.687758] [12538.687758] *** DEADLOCK *** [12538.687758] [12538.714455] 1 lock held by keyctl/25598: [12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220 [12538.770573] [12538.770573] stack backtrace: [12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G [12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015 [12538.881963] Call Trace: [12538.892897] dump_stack+0x9a/0xf0 [12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279 [12538.932891] ? save_trace+0xd6/0x250 [12538.948979] check_prev_add.constprop.32+0xc36/0x14f0 [12538.971643] ? keyring_compare_object+0x104/0x190 [12538.992738] ? check_usage+0x550/0x550 [12539.009845] ? sched_clock+0x5/0x10 [12539.025484] ? sched_clock_cpu+0x18/0x1e0 [12539.043555] __lock_acquire+0x1f12/0x38d0 [12539.061551] ? trace_hardirqs_on+0x10/0x10 [12539.080554] lock_acquire+0x14c/0x420 [12539.100330] ? __might_fault+0xc4/0x1b0 [12539.119079] __might_fault+0x119/0x1b0 [12539.135869] ? __might_fault+0xc4/0x1b0 [12539.153234] keyring_read_iterator+0x7e/0x170 [12539.172787] ? keyring_read+0x110/0x110 [12539.190059] assoc_array_subtree_iterate+0x97/0x280 [12539.211526] keyring_read+0xe9/0x110 [12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0 [12539.249076] keyctl_read_key+0x1b9/0x220 [12539.266660] do_syscall_64+0xa5/0x4b0 [12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf One way to prevent this deadlock scenario from happening is to not allow writing to userspace while holding the key semaphore. Instead, an internal buffer is allocated for getting the keys out from the read method first before copying them out to userspace without holding the lock. That requires taking out the __user modifier from all the relevant read methods as well as additional changes to not use any userspace write helpers. That is, 1) The put_user() call is replaced by a direct copy. 2) The copy_to_user() call is replaced by memcpy(). 3) All the fault handling code is removed. Compiling on a x86-64 system, the size of the rxrpc_read() function is reduced from 3795 bytes to 2384 bytes with this patch. Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>

view details

Waiman Long

commit sha 4f0882491a148059a52480e753b7f07fc550e188

KEYS: Avoid false positive ENOMEM error on key read By allocating a kernel buffer with a user-supplied buffer length, it is possible that a false positive ENOMEM error may be returned because the user-supplied length is just too large even if the system do have enough memory to hold the actual key data. Moreover, if the buffer length is larger than the maximum amount of memory that can be returned by kmalloc() (2^(MAX_ORDER-1) number of pages), a warning message will also be printed. To reduce this possibility, we set a threshold (PAGE_SIZE) over which we do check the actual key length first before allocating a buffer of the right size to hold it. The threshold is arbitrary, it is just used to trigger a buffer length check. It does not limit the actual key length as long as there is enough memory to satisfy the memory request. To further avoid large buffer allocation failure due to page fragmentation, kvmalloc() is used to allocate the buffer so that vmapped pages can be used when there is not a large enough contiguous set of pages available for allocation. In the extremely unlikely scenario that the key keeps on being changed and made longer (still <= buflen) in between 2 __keyctl_read_key() calls, the __keyctl_read_key() calling loop in keyctl_read_key() may have to be iterated a large number of times, but definitely not infinite. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com>

view details

Linus Torvalds

commit sha 4c205c84e249e0a91dcfabe461d77667ec9b2d05

Merge tag 'keys-fixes-20200329' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyrings fixes from David Howells: "Here's a couple of patches that fix a circular dependency between holding key->sem and mm->mmap_sem when reading data from a key. One potential issue is that a filesystem looking to use a key inside, say, ->readpages() could deadlock if the key being read is the key that's required and the buffer the key is being read into is on a page that needs to be fetched. The case actually detected is a bit more involved - with a filesystem calling request_key() and locking the target keyring for write - which could be being read" * tag 'keys-fixes-20200329' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: KEYS: Avoid false positive ENOMEM error on key read KEYS: Don't write out to userspace while holding key semaphore

view details

push time in 2 days

push eventtorvalds/linux

Thomas Hellstrom (VMware)

commit sha f05a3849f6449f67843113778bf56e02f2b4ddf8

fs: Constify vma argument to vma_is_dax The function is used by upcoming vma_is_special_huge() with which we want to use a const vma argument. Since for vma_is_dax() the vma argument is only dereferenced for reading, constify it. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Acked-by: Christian König <christian.koenig@amd.com>

view details

Thomas Hellstrom (VMware)

commit sha 2484ca9b6a20451debb789d0a89af6f15de99826

mm: Introduce vma_is_special_huge For VM_PFNMAP and VM_MIXEDMAP vmas that want to support transhuge pages and -page table entries, introduce vma_is_special_huge() that takes the same codepaths as vma_is_dax(). The use of "special" follows the definition in memory.c, vm_normal_page(): "Special" mappings do not wish to be associated with a "struct page" (either it doesn't exist, or it exists but they don't want to touch it) For PAGE_SIZE pages, "special" is determined per page table entry to be able to deal with COW pages. But since we don't have huge COW pages, we can classify a vma as either "special huge" or "normal huge". Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>

view details

Thomas Hellstrom (VMware)

commit sha 327e9fd489727aa83e13c80ae517f46cdbe8979e

mm: Split huge pages on write-notify or COW The functions wp_huge_pmd() and wp_huge_pud() currently relies on the huge_fault() callback to split huge page table entries if needed. However for module users that requires export of the split_huge_xxx() functionality which may be undesired. Instead split pre-existing huge page-table entries on VM_FAULT_FALLBACK return. We currently only do COW and write-notify on the PTE level, so if the huge_fault() handler returns VM_FAULT_FALLBACK on wp faults, split the huge pages and page-table entries. Also do this for huge PUDs if there is no huge_fault() handler and the vma is not anonymous, similar to how it's done for PMDs. Note that fs/dax.c still does the splitting in the huge_fault() handler, but as huge_fault() A follow-up patch can remove the dax.c split_huge_pmd() if needed. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>

view details

Thomas Hellstrom (VMware)

commit sha 9a9731b18c9bb70c023f0b2c731726fd5167673e

mm: Add vmf_insert_pfn_xxx_prot() for huge page-table entries For graphics drivers needing to modify the page-protection, add huge page-table entries counterparts to vmf_insert_pfn_prot(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>

view details

Thomas Hellstrom (VMware)

commit sha 314b6580adc543cf21a1a84bce9c6a58a8dcb38c

drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Support huge (PMD-size and PUD-size) page-table entries by providing a huge_fault() callback. We still support private mappings and write-notify by splitting the huge page-table entries on write-access. Note that for huge page-faults to occur, either the kernel needs to be compiled with trans-huge-pages always enabled, or the kernel needs to be compiled with trans-huge-pages enabled using madvise, and the user-space app needs to call madvise() to enable trans-huge pages on a per-mapping basis. Furthermore huge page-faults will not succeed unless buffer objects and user-space addresses are aligned on huge page size boundaries. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Christian König <christian.koenig@amd.com>

view details

Thomas Hellstrom (VMware)

commit sha 75390281ab68534fbecc71f370203d7981f04a3a

drm/vmwgfx: Support huge page faults With vmwgfx dirty-tracking we need a specialized huge_fault callback. Implement and hook it up. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Acked-by: Christian König <christian.koenig@amd.com>

view details

Thomas Hellstrom (VMware)

commit sha b182341667091c8edfb24a7caae600a2f08d7857

drm: Add a drm_get_unmapped_area() helper Unaligned virtual addresses makes it unlikely that huge page-table entries can be used. So align virtual buffer object address huge page boundaries to the underlying physical address huge page boundaries taking buffer object sizes into account to determine when it might be possible to use huge page-table entries. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Acked-by: Christian König <christian.koenig@amd.com>

view details

Thomas Hellstrom (VMware)

commit sha 7546f7ffdb5cbe12689d94fe657608364d2f7773

drm/vmwgfx: Introduce a huge page aligning TTM range manager Using huge page-table entries requires that the physical address of the start of a buffer object is huge page size aligned. Make a special version of the TTM range manager that accomplishes this, but falls back to a smaller page size alignment (PUD->PMD, PMD->NORMAL) to avoid eviction. If other drivers want to use it in the future, it can be made a TTM generic helper. Note that drivers can force eviction for a certain alignment by assigning the TTM GPU alignment correspondingly. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Acked-by: Christian König <christian.koenig@amd.com>

view details

Thomas Hellstrom (VMware)

commit sha 9431042dbc8ce490d49c7f9d5142805b6249208b

drm/vmwgfx: Hook up the helpers to align buffer objects Start using the helpers that align buffer object user-space addresses and buffer object vram addresses to huge page boundaries. This is to improve the chances of allowing huge page-table entries. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Acked-by: Christian König <christian.koenig@amd.com>

view details

Dave Airlie

commit sha 0e7e6198af28c1573267aba1be33dd0b7fb35691

Merge branch 'ttm-transhuge' of git://people.freedesktop.org/~thomash/linux into drm-next Huge page-table entries for TTM In order to reduce CPU usage [1] and in theory TLB misses this patchset enables huge- and giant page-table entries for TTM and TTM-enabled graphics drivers. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom (VMware) <thomas_os@shipmail.org> Link: https://patchwork.freedesktop.org/patch/msgid/20200325073102.6129-1-thomas_os@shipmail.org

view details

Linus Torvalds

commit sha ea9448b254e253e4d95afaab071b341d86c11795

Merge tag 'drm-next-2020-04-03-1' of git://anongit.freedesktop.org/drm/drm Pull drm hugepage support from Dave Airlie: "This adds support for hugepages to TTM and has been tested with the vmwgfx drivers, though I expect other drivers to start using it" * tag 'drm-next-2020-04-03-1' of git://anongit.freedesktop.org/drm/drm: drm/vmwgfx: Hook up the helpers to align buffer objects drm/vmwgfx: Introduce a huge page aligning TTM range manager drm: Add a drm_get_unmapped_area() helper drm/vmwgfx: Support huge page faults drm/ttm, drm/vmwgfx: Support huge TTM pagefaults mm: Add vmf_insert_pfn_xxx_prot() for huge page-table entries mm: Split huge pages on write-notify or COW mm: Introduce vma_is_special_huge fs: Constify vma argument to vma_is_dax

view details

push time in 2 days

push eventtorvalds/linux

Namjae Jeon

commit sha 1acf1a564b6034b5af1e7fb23cb98cb3bb4f6003

exfat: add in-memory and on-disk structures and headers This adds in-memory and on-disk structures and headers. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha 719c1e1829166da399903b13da6dcc5344e73449

exfat: add super block operations This adds the implementation of superblock operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha 5f2aa075070cf5b2e6aae011d5a3390408d7d913

exfat: add inode operations This adds the implementation of inode operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha ca06197382bde0a3bc20215595d1c9ce20c6e341

exfat: add directory operations This adds the implementation of directory operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha 98d917047e8b7f4bb2ff61d81b0ccd94a00444f9

exfat: add file operations This adds the implementation of file operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha 31023864e67a5f390cefbe92f72343027dc3aa33

exfat: add fat entry operations This adds the implementation of fat entry operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha 1e49a94cf707204b66a3fb242f2814712c941f52

exfat: add bitmap operations This adds the implementation of bitmap operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha c35b6810c4952ae5776607e2c1d6a587425d5834

exfat: add exfat cache This adds the implementation of exfat cache. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha 772b29cca528fcb21374bb3e597d848779938d16

exfat: add misc operations This adds the implementation of misc operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha 370e812b3ec190fa492c9fd5a80c38b086d105c0

exfat: add nls operations This adds the implementation of nls operations for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha b9d1e2e6265f5dc25e9f5dbfbde3e53d8a4958ac

exfat: add Kconfig and Makefile This adds the Kconfig and Makefile for exfat. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha 88ab55f16aae90e2e974eb67cc2380edb92b0661

MAINTAINERS: add exfat filesystem Add myself and Sungjong Seo as exfat maintainer. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Namjae Jeon

commit sha 1a3c0509ce83ce48104907207423c6eb929caa59

staging: exfat: make staging/exfat and fs/exfat mutually exclusive Make staging/exfat and fs/exfat mutually exclusive to select the one between two same filesystem. Suggested-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Valdis Kletnieks

commit sha 9acd0d53800c55c6e2186e29b6433daf24617451

exfat: update file system parameter handling Al Viro recently reworked the way file system parameters are handled Update super.c to work with it in linux-next 20200203. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Linus Torvalds

commit sha 83eb69f3b80f7cf2ca6357fb9c23adc48632a0e3

Merge branch 'work.exfat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull exfat filesystem from Al Viro: "Shiny new fs/exfat replacement for drivers/staging/exfat" * 'work.exfat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: exfat: update file system parameter handling staging: exfat: make staging/exfat and fs/exfat mutually exclusive MAINTAINERS: add exfat filesystem exfat: add Kconfig and Makefile exfat: add nls operations exfat: add misc operations exfat: add exfat cache exfat: add bitmap operations exfat: add fat entry operations exfat: add file operations exfat: add directory operations exfat: add inode operations exfat: add super block operations exfat: add in-memory and on-disk structures and headers

view details

push time in 2 days

push eventtorvalds/linux

Scott Mayhew

commit sha 7627d7dc79a8edd4b8f946a66002ea4205203112

nfsd: set the server_scope during service startup Currently, nfsd4_encode_exchange_id() encodes the utsname nodename string in the server_scope field. In a multi-host container environemnt, if an nfsd container is restarted on a different host than it was originally running on, clients will see a server_scope mismatch and will not attempt to reclaim opens. Instead, set the server_scope while we're in a process context during service startup, so we get the utsname nodename of the current process and store that in nfsd_net. Signed-off-by: Scott Mayhew <smayhew@redhat.com> [bfields: fix up major_id too] Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Gustavo A. R. Silva

commit sha 469aef23aa4e49d5191050410a1422117db03e11

sunrpc: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Amol Grover

commit sha 51cae673d036a32fe5cca9ec61765aa09e322c59

sunrpc: Pass lockdep expression to RCU lists detail->hash_table[] is traversed using hlist_for_each_entry_rcu outside an RCU read-side critical section but under the protection of detail->hash_lock. Hence, add corresponding lockdep expression to silence false-positive warnings, and harden RCU lists. Signed-off-by: Amol Grover <frextrite@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Gustavo A. R. Silva

commit sha c0fb23f867b632293500d7900e0288cf17bfcb1a

svcrdma: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Madhuparna Bhowmik

commit sha 36a8049181d590939a793e55105bb5921ec8c8b8

fs: nfsd: nfs4state.c: Use built-in RCU list checking list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Madhuparna Bhowmik

commit sha 057a2274357786c637093866c6668d5ff048436f

fs: nfsd: fileache.c: Use built-in RCU list checking list_for_each_entry_rcu() has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 412055398b9e67e07347a936fc4a6adddabe9cf4

nfsd: Fix NFSv4 READ on RDMA when using readv svcrdma expects that the payload falls precisely into the xdr_buf page vector. This does not seem to be the case for nfsd4_encode_readv(). This code is called only when fops->splice_read is missing or when RQ_SPLICE_OK is clear, so it's not a noticeable problem in many common cases. Add new transport method: ->xpo_read_payload so that when a READ payload does not fit exactly in rq_res's page vector, the XDR encoder can inform the RPC transport exactly where that payload is, without the payload's XDR pad. That way, when a Write chunk is present, the transport knows what byte range in the Reply message is supposed to be matched with the chunk. Note that the Linux NFS server implementation of NFS/RDMA can currently handle only one Write chunk per RPC-over-RDMA message. This simplifies the implementation of this fix. Fixes: b04209806384 ("nfsd4: allow exotic read compounds") Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053 Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 7dcf4ab952d64ab6c2870ad414c4b81bd7346427

NFSD: Clean up nfsd4_encode_readv Address some minor nits I noticed while working on this function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 758a3bf9459d9daa19eac604f3dece77e0bf2441

svcrdma: Fix double svc_rdma_send_ctxt_put() in an error path This error path is almost never executed. Found by code inspection. Fixes: 99722fe4d5a6 ("svcrdma: Persistently allocate and DMA-map Send buffers") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 96f194b715b61b11f0184c776a1283df8e152033

SUNRPC: Add xdr_pad_size() helper Introduce a helper function to compute the XDR pad size of a variable-length XDR object. Clean up: Replace open-coded calculation of XDR pad sizes. I'm sure I haven't found every instance of this calculation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 28155524eaa2eabdc97e588db195d0a45d7e4d6f

SUNRPC: Clean up: Replace dprintk and BUG_ON call sites in svcauth_gss.c Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha b20dfc3fcd6ed1e16c828c81e1fc6f4aea2cfa77

svcrdma: Create a generic tracing class for displaying xdr_buf layout This class can be used to create trace points in either the RPC client or RPC server paths. It simply displays the length of each part of an xdr_buf, which is useful to determine that the transport and XDR codecs are operating correctly. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 2426ddfdf169a4390e498b79b6d34d0d69c515bc

svcrdma: Remove svcrdma_cm_event() trace point Clean up. This trace point is no longer needed because the RDMA/core CMA code has an equivalent trace point that was added by commit ed999f820a6c ("RDMA/cma: Add trace points in RDMA Connection Manager"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha e604aad2cac7357162f661e45f2f60e46faa7b17

svcrdma: Use struct xdr_stream to decode ingress transport headers The logic that checks incoming network headers has to be scrupulous. De-duplicate: replace open-coded buffer overflow checks with the use of xdr_stream helpers that are used most everywhere else XDR decoding is done. One minor change to the sanity checks: instead of checking the length of individual segments, cap the length of the whole chunk to be sure it can fit in the set of pages available in rq_pages. This should be a better test of whether the server can handle the chunks in each request. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 2fe8c446338e083a1f3c0ccaaaa20e7d48e71ebc

svcrdma: De-duplicate code that locates Write and Reply chunks Cache the locations of the Requester-provided Write list and Reply chunk so that the Send path doesn't need to parse the Call header again. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 6fa5785e78d39f03d9fa33dea4dad2e7caf21e1e

svcrdma: Update synopsis of svc_rdma_send_reply_chunk() Preparing for subsequent patches, no behavior change expected. Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto() path. This enables passing more information about Requester- provided Write and Reply chunks into the lower-level send functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 4554755ed81bb690d709168550aba5b46447f069

svcrdma: Update synopsis of svc_rdma_map_reply_msg() Preparing for subsequent patches, no behavior change expected. Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto() path. This enables passing more information about Requester- provided Write and Reply chunks into those lower-level functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha db9602e40425c469dabaf8b3400ee553fbbd8307

svcrdma: Update synopsis of svc_rdma_send_reply_msg() Preparing for subsequent patches, no behavior change expected. Pass the RPC Call's svc_rdma_recv_ctxt deeper into the sendto() path. This enables passing more information about Requester- provided Write and Reply chunks into those lower-level functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha a406c563e84212c6c63c5204817e30e4c5250dbb

svcrdma: Rename svcrdma_encode trace points in send routines These trace points are misnamed: trace_svcrdma_encode_wseg trace_svcrdma_encode_write trace_svcrdma_encode_reply trace_svcrdma_encode_rseg trace_svcrdma_encode_read trace_svcrdma_encode_pzr Because they actually trace posting on the Send Queue. Let's rename them so that I can add trace points in the chunk list encoders that actually do trace chunk list encoding events. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

Chuck Lever

commit sha 5c266df52701635edfd49415b225fb17ceac5183

SUNRPC: Add encoders for list item discriminators Clean up. These are taken from the client-side RPC/RDMA transport to a more global header file so they can be used elsewhere. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

view details

push time in 2 days

push eventtorvalds/linux

Lubomir Rintel

commit sha a630fe34ddc09ec9b82df4867a8680d0929fe926

gpio: pxa: Avoid a warning when gpio0 and gpio1 IRQS are not there Not all platforms use those. Let's use platform_get_irq_byname_optional() instead platform_get_irq_byname() so that we avoid a useless warning: [ 1.359455] pxa-gpio d4019000.gpio: IRQ gpio0 not found [ 1.359583] pxa-gpio d4019000.gpio: IRQ gpio1 not found Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>

view details

Axel Lin

commit sha 47d7d116661993499d626f7ec6f7679e83d59f15

gpio: wcd934x: Don't change gpio direction in wcd_gpio_set The .set callback should just set output value. Signed-off-by: Axel Lin <axel.lin@ingics.com> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>

view details

Axel Lin

commit sha 47203198ed3df0f6896d07613182c05cb94110a5

gpio: wcd934x: Fix logic of wcd_gpio_get The check with register value and mask should be & rather than &&. While at it, also use "unsigned int" for value variable because regmap_read() takes unsigned int *val argument. Signed-off-by: Axel Lin <axel.lin@ingics.com> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>

view details

Andy Shevchenko

commit sha 046e14afb3561523efd0047c35c20793ae5f8848

gpio: Avoid kernel.h inclusion where it's possible Inclusion of kernel.h increases the mess with the header dependencies. Avoid kernel.h inclusion where it's possible. Besides that, clean up a bit other inclusions inside GPIO subsystem headers. It includes: - removal pin control bits (forward declaration and header) from linux/gpio.h - removal of.h from asm-generic/gpio.h - use of explicit headers in gpio/consumer.h - add FIXME note with regard to gpio.h inclusion in of_gpio,h Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20200205134336.20197-1-andriy.shevchenko@linux.intel.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

Bartosz Golaszewski

commit sha 3f2e4c11e136e2cffd60dbc840b59ff65f017328

kfifo: provide noirqsave variants of spinlocked in and out helpers Provide variants of spinlocked kfifo_in() and kfifo_out() routines which don't disable interrupts. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Stefani Seibold <stefani@seibold.net>

view details

Bartosz Golaszewski

commit sha 5195a89e8583bba43ec13871a7226763e401b44e

kfifo: provide kfifo_is_empty_spinlocked() Provide two spinlocked versions of kfifo_is_empty() to be used with spinlocked variants of kfifo_in() and kfifo_out(). Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Stefani Seibold <stefani@seibold.net>

view details

Bartosz Golaszewski

commit sha dea9c80ee6726986d90260f135c83545427cbc4e

gpiolib: rework the locking mechanism for lineevent kfifo The read_lock mutex is supposed to prevent collisions between reading and writing to the line event kfifo but it's actually only taken when the events are being read from it. Drop the mutex entirely and reuse the spinlock made available to us in the waitqueue struct. Take the lock whenever the fifo is modified or inspected. Drop the call to kfifo_to_user() and instead first extract the new element from kfifo when the lock is taken and only then pass it on to the user after the spinlock is released. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Bartosz Golaszewski

commit sha 248ae1752e91438cb7cfc8432b88bc34410e8e4e

gpiolib: emit a debug message when adding events to a full kfifo Currently if the line-event kfifo is full, we just silently drop any new events. Add a ratelimited debug message so that we at least have some trace in the kernel log of event overflow. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>

view details

Bartosz Golaszewski

commit sha d2ac25798208fb85f866056cd7d8030eb919da99

gpiolib: provide a dedicated function for setting lineinfo We'll soon be filling out the gpioline_info structure in multiple places. Add a separate function that given a gpio_desc sets all relevant fields. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Bartosz Golaszewski

commit sha 51c1064e82e77b39a49889287ca50709303e2f26

gpiolib: add new ioctl() for monitoring changes in line info Currently there is no way for user-space to be informed about changes in status of GPIO lines e.g. when someone else requests the line or its config changes. We can only periodically re-read the line-info. This is fine for simple one-off user-space tools, but any daemon that provides a centralized access to GPIO chips would benefit hugely from an event driven line info synchronization. This patch adds a new ioctl() that allows user-space processes to reuse the file descriptor associated with the character device for watching any changes in line properties. Every such event contains the updated line information. Currently the events are generated on three types of status changes: when a line is requested, when it's released and when its config is changed. The first two are self-explanatory. For the third one: this will only happen when another user-space process calls the new SET_CONFIG ioctl() as any changes that can happen from within the kernel (i.e. set_transitory() or set_debounce()) are of no interest to user-space. Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

view details

Bartosz Golaszewski

commit sha 33f0c47b8fb4724792b16351a32b24902a5d3b07

tools: gpio: implement gpio-watch Add a simple program that allows to test the new LINECHANGED_FD ioctl(). Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Geert Uytterhoeven

commit sha 38a49742de113746dfb954afe7b1b65a098a69a8

rtc: sh: Restore devm_ioremap() alignment The alignment of the continuation of the devm_ioremap() call in sh_rtc_probe() was broken. Join the lines, as all parameters can fit on a single line. Fixes: 4bdc0d676a643140 ("remove ioremap_nocache and devm_ioremap_nocache") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20200212084836.9511-1-geert+renesas@glider.be Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

view details

Srinivas Neeli

commit sha 4594d082dbe6385c2df6b838817ccd214c16b358

rtc: zynqmp: Clear alarm interrupt status before interrupt enable Fix multiple occurring interrupts for alarm interrupt. RTC module doesn't clear the alarm interrupt status bit immediately after the interrupt is triggered.This is due to the sticky nature of the alarm interrupt status register. The alarm interrupt status register can be cleared only after the second counter outruns the set alarm value. To fix multiple spurious interrupts, disable alarm interrupt in the handler and clear the status bit before enabling the alarm interrupt. Fixes: 11143c19eb57 ("rtc: add xilinx zynqmp rtc driver") Signed-off-by: Srinivas Neeli <srinivas.neeli@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Link: https://lore.kernel.org/r/1581503079-387-1-git-send-email-srinivas.neeli@xilinx.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

view details

Linus Walleij

commit sha b2929a9cb2fbfedf30c66033a865c8135a7c2184

Merge tag 'gpio-updates-for-v5.7-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into devel gpio updates for v5.7 part 1 - make irqs optional in gpio-pxa - improve the logic behind the get() and set() callbacks in gpio-wcd934x - add new kfifo helpers (acked by kfifo maintainer) - rework the locking mechanism for lineevent kfifo - implement a new ioctl() for watching changes on GPIO lines

view details

Thomas Richter

commit sha 0d6f1693f255795d5c747dc444d69c6512586d98

s390/cpum_sf: Rework sampling buffer allocation Adjust sampling buffer allocation depending on frequency and correct comments. Investigation on the interrupt handler revealed that almost always one interupt services one SDB, even when running with the maximum frequency of 100000. Very rarely there have been 2 SBD serviced per interrupt. Therefore reduce the number of SBD per CPU. Each SDB is one page in size. The new formula results in freq:4000 n_sdb:32 new:16 freq:10000 n_sdb:80 new:16 freq:20000 n_sdb:159 new:17 freq:40000 n_sdb:318 new:19 freq:50000 n_sdb:397 new:20 freq:62500 n_sdb:497 new:22 freq:83333 n_sdb:662 new:24 freq:100000 n_sdb:794 new:25 Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

view details

Harald Freudenberger

commit sha c4f762ff6b7766e0053e39d1d87d599384288048

s390/zcrypt: Support for CCA protected key block version 2 There will come a new CCA keyblock version 2 for protected keys delivered back to the OS. The difference is only the amount of available buffer space to be up to 256 bytes for version 2. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

view details

Anson Huang

commit sha a137e9b620bcf3925a8d72dd7ba723910d0bf976

rtc: snvs: Remove unused include of of_device.h There is nothing in use from of_device.h, remove it. Signed-off-by: Anson Huang <Anson.Huang@nxp.com> Link: https://lore.kernel.org/r/1581823666-16944-1-git-send-email-Anson.Huang@nxp.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

view details

Julian Wiedmann

commit sha dd62abd2d84d8fe09c644a7407a34c22cb3d43be

s390/qdio: clean up cdev access in qdio_setup_irq() Some parts use init_data->cdev, others use irq_ptr->cdev. In the end it's all the same, but unnecessarily confusing. Use a single reference instead. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

view details

Julian Wiedmann

commit sha 014816b66218d9f5f90e6d92951abc9d3749b4cd

s390/qdio: reduce access to cdev->private->qdio_data Remove all usage of cdev->private->qdio_data that's buried deep in internal code. This should only be used by the exported driver API, which can then pass around a proper qdio_irq pointer. Also trivially merge some initializations with their definitions. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

view details

Stefan Raspl

commit sha b059a39cfa27c04e8e03e4ddf44f16501f36357d

s390/arch: install kernels with their proper version ID In case $INSTALLKERNEL is not available, we should install the kernel image with its version number, and save the previous one accordingly. Also, we're adding a hint so users know that they still need to perform one more configuration step (usually adjusting zipl config). Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

view details

push time in 2 days

push eventtorvalds/linux

Eugeniy Paltsev

commit sha f61f530c5a144ac548c5648c91441663b12437b8

ARC: [plat-axs10x]: PGU: remove unused encoder-slave property ARC PGU is looking for encoder via endpoint mechanism and doesn't use "encoder-slave" property for a long time. Let's drop unused "encoder-slave" property from ARC PGU node in axs10x. Acked-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

view details

Eugeniy Paltsev

commit sha 240c84b1c22c9912ed1c8a96251b44e85d2ca2ed

ARC: add helpers to sanitize config options We'll use this macro in coming patches extensively. Reviewed-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

view details

Eugeniy Paltsev

commit sha 4827d0cf744e7e9cc73f10e1f4eaca904a3868c1

ARC: handle DSP presence in HW When DSP extensions are present, some of the regular integer instructions such as DIV, MACD etc are executed in the DSP unit with semantics alterable by flags in DSP_CTRL aux register. This register is writable by userspace and thus can potentially affect corresponding instructions in kernel code, intentionally or otherwise. So safegaurd kernel by effectively disabling DSP_CTRL upon bootup and every entry to kernel. Do note that for this config we simply zero out the DSP_CTRL reg assuming userspace doesn't really care about DSP. The next patch caters to the DSP aware userspace where this reg is saved/restored upon kernel entry/exit. Reviewed-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

view details

Eugeniy Paltsev

commit sha 7321e2ea0d6aece516a9c0827028ecda2ccaeae9

ARC: add support for DSP-enabled userspace applications To be able to run DSP-enabled userspace applications we need to save and restore following DSP-related registers: At IRQ/exception entry/exit: * DSP_CTRL (save it and reset to value suitable for kernel) * ACC0_LO, ACC0_HI (we already save them as r58, r59 pair) At context switch: * ACC0_GLO, ACC0_GHI * DSP_BFLY0, DSP_FFT_CTRL Reviewed-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

view details

Eugeniy Paltsev

commit sha f09d3174f002ee2cf15623d5a0f68f7393536ce7

ARC: allow userspace DSP applications to use AGU extensions To be able to run DSP-enabled userspace applications with AGU (address generation unit) extensions we additionally need to save and restore following registers at context switch: * AGU_AP* * AGU_OS* * AGU_MOD* Reviewed-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

view details

Linus Torvalds

commit sha 5364abc57993b3bf60c41923cb98a8f1a594e749

Merge tag 'arc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC updates from Vineet Gupta: - Support for DSP enabled userspace (save/restore regs) - Misc other platform fixes * tag 'arc-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: allow userspace DSP applications to use AGU extensions ARC: add support for DSP-enabled userspace applications ARC: handle DSP presence in HW ARC: add helpers to sanitize config options ARC: [plat-axs10x]: PGU: remove unused encoder-slave property

view details

push time in 3 days

push eventtorvalds/linux

Andre Przywara

commit sha 30bd02bd634f4a483e965fb41a076e47ea9681ef

arm64: dts: sun50i: H6: Add SPI controllers nodes and pinmuxes The Allwinner H6 SoC contains two SPI controllers similar to the H3/A64, but with the added capability of 3-wire and 4-wire operation modes. For now the driver does not support those, but the SPI registers are fully backwards-compatible, just adding bits and registers which were formerly reserved. So we can use the existing driver in "legacy" SPI modes, for instance to access the SPI NOR flash soldered on the PineH64 board. We use an H6 specific compatible string in addition to the existing H3 string, so when the driver later gains QSPI support, it should work automatically without any DT changes. Tested by accessing the SPI flash on a Pine H64 board (SPI0), also connecting another SPI flash to the SPI1 header pins. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Andre Przywara

commit sha e757bdd01780d0ea3e6774247b735caf2d1a9e92

arm64: dts: allwinner: h6: Pine H64: Add SPI flash node The Pine H64 board comes with SPI flash soldered on the board, connected to the SPI0 pins (so it can also boot from there). Add the required SPI flash DT node to describe this. Unfortunately the SPI CS0 pin collides with the eMMC CMD pin, so we can't use both eMMC and SPI flash at the same time (the first to claim the pin would win, the other's probe routine would then fail). To avoid losing the more useful eMMC device by chance, mark the SPI device as "disabled" for now. A user or some U-Boot code could fix this up if needed, for instance if no eMMC has been detected (it's socketed). Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Andre Przywara

commit sha e2c9e67e44fedcf37b383d3cacc525674bed457f

dt-bindings: spi: sunxi: Document new compatible strings The Allwinner H6 SPI controller has advanced features over the H3 version, but remains compatible with it. Document the usual "specific", "fallback" compatible string pair. Also add the R40 version while at it. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Icenowy Zheng

commit sha 39b6343d1d4180b83edec31e1d0b36fe1ccfaf77

dt-bindings: arm: sunxi: add binding for PineTab tablet Add the device tree binding for Pine64's PineTab tablet, which uses Allwinner A64 SoC. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Icenowy Zheng

commit sha 674ef1d0a7b23681699994a892a1ed508c7df122

arm64: dts: allwinner: a64: add support for PineTab PineTab is a 10.1" tablet by Pine64 with Allwinner A64 inside. It includes the following peripherals: USB: - A microUSB Type-B port connected to the OTG-capable USB PHY of Allwinner A64. The ID pin is connected to a GPIO of the A64 SoC, and the Vbus is connected to the Vbus of AXP803 PMIC. These enables OTG functionality on this port. - A USB Type-A port is connected to the internal hub attached to the non-OTG USB PHY of Allwinner A64. - There are reserved pins for an external keyboard connected to the internal hub. Power: - The microUSB port has its Vbus connected to AXP803, mentioned above. - A DC jack (of a strange size, 2.5mm outer diameter) is connected to the ACIN of AXP803. - A Li-Polymer battery is connected to the battery pins of AXP803. Storage: - An tradition Pine64 eMMC slot is on the board, mounted with an eMMC module by factory. - An external microSD slot is hidden under a protect case. Display: - A MIPI-DSI LCD panel (800x1280) is connected to the DSI port of A64 SoC. - A mini HDMI port. Input: - A touch panel attached to a Goodix GT9271 touch controller. - Volume keys connected to the LRADC of the A64 SoC. Camera: - An OV5640 CMOS camera is at rear, connected to the CSI bus of A64 SoC. - A GC2145 CMOS camera is at front, shares the same CSI bus with OV5640. Audio: - A headphone jack is conencted to the SoC's internal codec. - A speaker connected is to the Line Out port of SoC's internal codec, via an amplifier. Misc: - Debug UART is muxed with the headphone jack, with the switch next to the microSD slot. - A bosch BMA223 accelerometer is connected to the I2C bus of A64 SoC. - Wi-Fi and Bluetooth are available via a RTL8723CS chip, similar to the one in Pinebook. This commit adds a basically usable device tree for it, implementing most of the features mentioned above. HDMI is not supported now because bad LCD-HDMI coexistence situation of mainline A64 display driver, the front camera currently lacks a driver and a facility to share the bus with the rear one, and the accelerometer currently lacks a DT binding. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Samuel Holland

commit sha 787615ad874151af11017e95e29b7f1ef1c1590f

arm64: dts: allwinner: Enable button wakeup on Orange Pi PC2 The Orange Pi PC2 features a GPIO button. As the button is connected to Port L (pin PL3), it can be used as a wakeup source. Enable this. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Samuel Holland

commit sha 9e556ec5731eb3a7df3ee27e832d74b17390499e

arm64: dts: allwinner: pinebook: Remove unused vcc3v3 regulator This fixed regulator has no consumers, GPIOs, or other connections. Remove it. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Samuel Holland

commit sha c0e79b069e4ff6e0881d82ba831424ca7362e828

arm64: dts: allwinner: pinebook: Sort device tree nodes The r_i2c node should come before r_rsb, and in any case should not separate the axp803 node from its subnodes. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Samuel Holland

commit sha c3d22680df8da15c6ee2a6a157fcf3078915cc26

arm64: dts: allwinner: pinebook: Make simplefb more consistent Boards generally reference the simplefb nodes from the SoC dtsi by label, not by full path. simplefb_hdmi is already like this in the Pinebook DTS. Update simplefb_lcd to match. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Samuel Holland

commit sha 412e19c34f501da68c19d8fdd7de4c535ea89d3b

arm64: dts: allwinner: pinebook: Document MMC0 CD pin name Normally GPIO pin references are followed by a comment giving the pin name for searchability. Add the comment here where it was missing. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Samuel Holland

commit sha 9de2b6bf5ecb1f2149ffe19e4aefbdcbe68790f1

arm64: dts: allwinner: pinebook: Add GPIO port regulators Allwinner A64 SoC has separate supplies for PC, PD, PE, PG and PL. VCC-PC and VCC-PG are supplied by ELDO1 at 1.8v. VCC-PD is supplied by DCDC1 (VCC-IO) at 3.3v. VCC-PE is supplied by ALDO1, and is unused. VCC-PL creates a circular dependency, so it is omitted for now. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Samuel Holland

commit sha 47ef030c3a796e07fec12c6981bbb46bcb258e2e

arm64: dts: allwinner: pinebook: Fix backlight regulator The output from the backlight regulator is labeled as "VBKLT" in the schematic. Using the equation and resistor values from the schematic, the output is approximately 18V, not 3.3V. Since the regulator in use (SS6640STR) is a boost regulator powered by PS (battery or AC input), which are both >3.3V, the output could not be 3.3V anyway. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Samuel Holland

commit sha e95d8d03b5904a373895c14a44eb60264c7bc041

arm64: dts: allwinner: pinebook: Fix 5v0 boost regulator Now that AXP803 GPIO support is available, we can properly model the hardware. Replace the use of GPIO0-LDO with a fixed regulator controlled by GPIO0. This boost regulator is used to power the (internal and external) USB ports, as well as the speakers. Signed-off-by: Samuel Holland <samuel@sholland.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Emmanuel Vadot

commit sha 5a5e52161894ab70062e6989aa15f672664334f8

arm64: dts: allwinner: a64: Add gpio bank supply for A64-Olinuxino Add the regulators for each bank on this boards. For VCC-PL only add a comment on what regulator is used. We cannot add the property without causing a circular dependency as the PL pins are used to talk to the PMIC. Signed-off-by: Emmanuel Vadot <manu@freebsd.org> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Yangtao Li

commit sha 6a7be15a66e6138d7efad71278177d8432bd4ea5

ARM: dts: sun8i-r40: Add thermal sensor and thermal zones There are two sensors, sensor0 for CPU, sensor1 for GPU. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech>

view details

Viresh Kumar

commit sha 71af05a7d0eb63fa5a6b64f296a5f2c8d84d6a9e

firmware: arm_scmi: Update doc style comments Fix minor formatting issues with the doc style comments. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/r/1bff7c0d1ad2c8b6eeff9660421f414f8c612eb2.1580448239.git.viresh.kumar@linaro.org Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

view details

Viresh Kumar

commit sha c4eb83660aef549a6a50965ec5f0af9c8306d2e9

firmware: arm_scmi: Move macros and helpers to common.h Move message header specific macros and helper routines to common.h as they will be used outside of driver.c in a later commit. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/r/6615db480370719b0a0241447a5f3feb8eea421f.1580448239.git.viresh.kumar@linaro.org Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

view details

Viresh Kumar

commit sha 5c8a47a5a91d4d6e185f758d61997613d9c5d6ac

firmware: arm_scmi: Make scmi core independent of the transport type The SCMI specification is fairly independent of the transport protocol, which can be a simple mailbox (already implemented) or anything else. The current Linux implementation however is very much dependent on the mailbox transport layer. This patch makes the SCMI core code (driver.c) independent of the mailbox transport layer and moves all mailbox related code to a new file: mailbox.c and all struct shared_mem related code to a new file: shmem.c. We can now implement more transport protocols to transport SCMI messages. The transport protocols just need to provide struct scmi_transport_ops, with its version of the callbacks to enable exchange of SCMI messages. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/r/8698a3cec199b8feab35c2339f02dc232bfd773b.1580448239.git.viresh.kumar@linaro.org Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

view details

Geert Uytterhoeven

commit sha 408324a3c5383716939eea8096a0f999a0665f7e

ARM: shmobile: Enable ARM_GLOBAL_TIMER on Cortex-A9 MPCore SoCs SH-Mobile AG5 and R-Car H1 SoCs are based on the Cortex-A9 MPCore, which includes a global timer. Enable the ARM global timer on these SoCs, which will be used for: - the scheduler clock, improving scheduler accuracy from 10 ms to 3 or 4 ns, - delay loops, allowing removal of calls to shmobile_init_delay() from the corresponding machine vectors. Note that when using an old DTB lacking the global timer, the kernel will still work. However, loops-per-jiffies will no longer be preset, and the delay loop will need to be calibrated during boot. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20191211135222.26770-5-geert+renesas@glider.be

view details

Marek Vasut

commit sha 516f68943a6ad4aadfa384e4ce3751e03e1afbda

ARM: dts: renesas: Add missing ethernet PHY reset GPIO on Gen2 reference boards The ethernet PHY reset GPIO was missing and the kernel was depending solely on the bootloader to bring the PHY out of reset. Fix this to get rid of the dependency on bootloader. Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Link: https://lore.kernel.org/r/20200115051225.7346-1-marek.vasut@gmail.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

view details

push time in 3 days

push eventtorvalds/linux

Sushma Kalakota

commit sha 449a01d2659c207b012e6d3bb6edfff8c94a4e55

PCI: vmd: Add two VMD Device IDs Add new VMD device IDs that require the bus restriction mode. Signed-off-by: Sushma Kalakota <sushmax.kalakota@intel.com> Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

view details

Colin Ian King

commit sha e3cdcfcea363edadebf94576c29d33a441fd5a6f

PCI/ACPI: Move pcie_to_hpx3_type[] from stack to static data Move pcie_to_hpx3_type[] from the stack to static data. This reduces stack usage and also makes the object code slightly smaller. Link: https://lore.kernel.org/r/20200210085256.319424-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

view details

Alexandru Gagniuc

commit sha 202853595e53f981c86656c49fc1cc1e3620f558

PCI: pciehp: Disable in-band presence detect when possible The presence detect state (PDS) is normally a logical OR of in-band and out-of-band (OOB) presence detect. As of PCIe 4.0, there is the option to disable in-band presence so that the PDS bit always reflects the state of the out-of-band presence. The recommendation of the PCIe spec is to disable in-band presence whenever supported (PCIe r5.0, appendix I implementation note): Due to architectural issues, the in-band (Physical-Layer-based) portion of the PD mechanism is deprecated for use with async hot-plug. One issue is that in-band PD as architected does not detect adapter removal during certain LTSSM states, notably the L1 and Disabled States. Another issue is that when both in-band and OOB PD are being used together, the Presence Detect State bit and its associated interrupt mechanism always reflect the logical OR of the inband and OOB PD states, and with some hot-plug hardware configurations, it is important for software to detect and respond to in-band and OOB PD events independently. If OOB PD is being used and the associated DSP supports In-Band PD Disable, it is recommended that the In-Band PD Disable bit be Set, and the Presence Detect State bit and its associated interrupt mechanism be used exclusively for OOB PD. As a substitute for in-band PD with async hot-plug, the reference model uses either the DPC or the DLL Link Active mechanism. Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com [bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD value (suggested by Lukas)] Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>

view details

Alexandru Gagniuc

commit sha f496648b99f8f7f6711f7c30a6327381f37dd1e8

PCI: pciehp: Wait for PDS if in-band presence is disabled When in-band presence detect is disabled, PDS may come up at any time or not at all. PDS being low may indicate that the card is still mating, and we could expect contact bounce to bring down the link as well. It is reasonable to assume that most cards will mate in a hotplug slot in about a second. Thus, when we know PDS only reflects out-of-band presence detect, it's worthwhile to wait the extra second or so to make sure the card is properly mated before loading the driver and to prevent the hotplug code from disabling a device if the presence detect change goes active after the device is enabled. Link: https://lore.kernel.org/r/20191025190047.38130-3-stuart.w.hayes@gmail.com [bhelgaas: use ctrl_info() instead of pci_info()] Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>

view details

Stuart Hayes

commit sha 0b382546d863f2f09eecaccda95a0b4bfd148f92

PCI: pciehp: Add DMI table for in-band presence detection disabled Some systems have in-band presence detection disabled for hot-plug PCI slots but do not report this in the slot capabilities 2 (SLTCAP2) register. On these systems, presence detect can become active well after the link is reported to be active, which can cause the slots to be disabled after a device is connected. Add a DMI table to flag these systems as having in-band presence detect disabled. Link: https://lore.kernel.org/r/20191025190047.38130-4-stuart.w.hayes@gmail.com Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>

view details

Hou Zhiqiang

commit sha 1f442218d657b1a20900f09ae1fc269b69b3de70

PCI: mobiveil: Introduce a new structure mobiveil_root_port The Mobiveil PCIe controller can work in either Root Complex mode or Endpoint mode. Introduce a new structure mobiveil_root_port and abstract the RC related members into it so that the code can be used by both modes. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>

view details

Hou Zhiqiang

commit sha 2ba24842d6b42453dbc9dfffdf5faf0cec2d7698

PCI: mobiveil: Move the host initialization into a function Move the host initialization related operations into a new function so that it can be reused by other platform PCIe host drivers integrating the Mobiveil GPEX. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>

view details

Hou Zhiqiang

commit sha 39e3a03eea5ba113f9ffe1cf39f0e0b4bc6a6713

PCI: mobiveil: Collect the interrupt related operations into a function Collect the interrupt initialization related operations into a new function to make code more readable. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>

view details

Hou Zhiqiang

commit sha 03bdc3884019fb6463ac8163cc0e890920515f8b

PCI: mobiveil: Modularize the Mobiveil PCIe Host Bridge IP driver Modularize the Mobiveil PCIe host driver according to the abstraction of Root Complex and Endpoint and move it into a new directory in order to make it easier to reuse the driver functions to add new host drivers for systems integrating the Mobiveil PCIe GPEX IP. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com>

view details

Hou Zhiqiang

commit sha ed620e96541f3248ad7cfe069f98d43177ae0435

PCI: mobiveil: Add callback function for interrupt initialization The Mobiveil GPEX internal MSI/INTx controller is not implemented in all platforms in which the Mobiveil GPEX is integrated. Allow platforms to implement their specific interrupt initialization. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>

view details

Hou Zhiqiang

commit sha fc99b3311af7125c46b56e753dc1a65c27b0d7e2

PCI: mobiveil: Add callback function for link up check Platforms integrating the Mobiveil GPEX can implement a specific mechanism to check the link status. Add a callback to enable platform specific link status functions. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> [lorenzo.pieralisi@arm.com: updated log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com>

view details

Hou Zhiqiang

commit sha 52cae4c7082f5f479f1692ba3f5ee6292d0aa4f9

PCI: mobiveil: Allow mobiveil_host_init() to be used to re-init host Allow the mobiveil_host_init() function to be used to re-init host controller's PAB and GPEX CSR register block, since the NXP integrated Mobiveil IP has to reset and then re-init the PAB and GPEX CSR registers upon hot-reset. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>

view details

Hou Zhiqiang

commit sha 029dea3cdc67690736e35b78c9d8ed6da1c9ec98

PCI: mobiveil: Add 8-bit and 16-bit CSR register accessors There are some 8-bit and 16-bit registers in PCIe configuration space, so add these accessors accordingly. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Minghuan Lian <Minghuan.Lian@nxp.com> Reviewed-by: Subrahmanya Lingappa <l.subrahmanya@mobiveil.co.in> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>

view details

Hou Zhiqiang

commit sha 11d22cc395ca1b9ff78255edaee3bf258515ad72

PCI: mobiveil: Add Header Type field check Check the Header Type and exit from the host driver initialization if it is not in host mode. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>

view details

Hou Zhiqiang

commit sha 3edeb49525bbe34441618fdc5912b1ec37b2a61c

dt-bindings: PCI: Add NXP Layerscape SoCs PCIe Gen4 controller Add PCIe Gen4 controller DT bindings of NXP Layerscape SoCs. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Rob Herring <robh@kernel.org>

view details

Hou Zhiqiang

commit sha d29ad70a813b0daa424b70950d432c34a1a95865

PCI: mobiveil: Add PCIe Gen4 RC driver for Layerscape SoCs Add a PCI host controller driver for Layerscape SoCs integrating the Mobiveil GPEX IP. Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com> [lorenzo.pieralisi@arm.com: updated commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Minghuan Lian <Minghuan.Lian@nxp.com> Reviewed-by: Andrew Murray <amurray@thegoodpenguin.co.uk>

view details

Kishon Vijay Abraham I

commit sha 5779dd0a7dbd71e82478fb0bf125cc6cd3c43266

PCI: endpoint: Use notification chain mechanism to notify EPC events to EPF Use atomic_notifier_call_chain() to notify EPC events like linkup to EPF driver instead of using linkup ops in EPF driver. This is in preparation for adding proper locking mechanism to EPF ops. This will also enable to add more events (in addition to linkup) in the future. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Vidya Sagar <vidyas@nvidia.com>

view details

Kishon Vijay Abraham I

commit sha 3d3248dbd018502f654064c78efcd2e165ab3486

PCI: endpoint: Replace spinlock with mutex The pci_epc_ops is not intended to be invoked from interrupt context. Hence replace spin_lock_irqsave and spin_unlock_irqrestore with mutex_lock and mutex_unlock respectively. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

view details

Kishon Vijay Abraham I

commit sha 04e046ca57ebed3943422dee10eec9e73aec081e

PCI: endpoint: Fix for concurrent memory allocation in OB address region pci-epc-mem uses a bitmap to manage the Endpoint outbound (OB) address region. This address region will be shared by multiple endpoint functions (in the case of multi function endpoint) and it has to be protected from concurrent access to avoid updating an inconsistent state. Use a mutex to protect bitmap updates to prevent the memory allocation API from returning incorrect addresses. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: stable@vger.kernel.org # v4.14+

view details

Kishon Vijay Abraham I

commit sha 07301c982643a432212840a4b648b5d3f5a061fa

PCI: endpoint: Protect concurrent access to pci_epf_ops with mutex Protect concurrent access to pci_epf_ops with a mutex. Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

view details

push time in 3 days

push eventtorvalds/linux

Alexander Kapshuk

commit sha ff5cd9accbc724a793904c6e401040c85be89748

ver_linux: Query ld cache for versions of libc/libcpp run-time Query ld cache for versions of both libc and libcpp run-time, instead of querying /proc/self/maps for libc run-time, and ld cache for libcpp run-time, thus reducing code size and complexity. Signed-off-by: Alexander Kapshuk <alexander.kapshuk@gmail.com> Link: https://lore.kernel.org/r/20200209140057.20181-1-alexander.kapshuk@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

view details

Alexander Popov

commit sha 01c0514ec8226386ce8367dcd8814f86224caaeb

lkdtm/stackleak: Make the test more verbose Make the stack erasing test more verbose about the errors that it can detect. Signed-off-by: Alexander Popov <alex.popov@linux.com> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20200102234907.585508-1-alex.popov@linux.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

view details

Gustavo A. R. Silva

commit sha d0cff8adce1370c03ddb2ccb4d8c2921e00181c4

misc: vexpress: Replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertenly introduced[3] to the codebase from now on. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200211211010.GA32239@embeddedor Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

view details

Gustavo A. R. Silva

commit sha 6736041f9606f195339cacb4bcce232f1a2a1ed3

mei: bus: replace zero-length array with flexible-array member The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertenly introduced[3] to the codebase from now on. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200211210822.GA31368@embeddedor Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

view details

Tomas Winkler

commit sha 3aef021b2df7d8440225a53460c0d34b140297d5

mei: limit number of bytes in mei header. The MEI message header provides only 9 bits for storing the message size, limiting to 511. In theory the host buffer (hbuf) can contain up to 1020 bytes (limited by byte = 255 * 4) With the current hardware and hbuf size 512, this is not a real issue, but as hardening approach we enforce the limit. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20200211160522.7562-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

view details

Michal Koutný

commit sha f43caa2adc96fc9c95fd77eef63cdff86ebf33cb

cgroup: Clean up css_set task traversal css_task_iter stores pointer to head of each iterable list, this dates back to commit 0f0a2b4fa621 ("cgroup: reorganize css_task_iter") when we did not store cur_cset. Let us utilize list heads directly in cur_cset and streamline css_task_iter_advance_css_set a bit. This is no intentional function change. Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Suren Baghdasaryan

commit sha 04189382c0be4e045d63a38b47f012a8f6c35edc

kselftest/cgroup: add cgroup destruction test Add new test to verify that a cgroup with dead processes can be destroyed. The test spawns a child process which allocates and touches 100MB of RAM to ensure prolonged exit. Subsequently it kills the child, waits until the cgroup containing the child is empty and destroys the cgroup. Signed-off-by: Suren Baghdasaryan <surenb@google.com> [mkoutny@suse.com: Fix typo in test_cgcore_destroy comment] Acked-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Madhuparna Bhowmik

commit sha 3010c5b9f5f476b35b24955f08a3a6c06ec8e878

cgroup.c: Use built-in RCU list checking list_for_each_entry_rcu has built-in RCU and lock checking. Pass cond argument to list_for_each_entry_rcu() to silence false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled by default. Even though the function css_next_child() already checks if cgroup_mutex or rcu_read_lock() is held using cgroup_assert_mutex_or_rcu_locked(), there is a need to pass cond to list_for_each_entry_rcu() to avoid false positive lockdep warning. Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Acked-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Prateek Sood

commit sha a49e4629b5edf1db856de05fbf1aae05502ef1af

cpuset: Make cpuset hotplug synchronous Convert cpuset_hotplug_workfn() into synchronous call for cpu hotplug path. For memory hotplug path it still gets queued as a work item. Since cpuset_hotplug_workfn() can be made synchronous for cpu hotplug path, it is not required to wait for cpuset hotplug while thawing processes. Signed-off-by: Prateek Sood <prsood@codeaurora.org> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Greg Kroah-Hartman

commit sha 239a5791ffd5559f51815df442c4dbbe7fc21ade

dynamic_debug: allow to work if debugfs is disabled With the realization that having debugfs enabled on "production" systems is generally not a good idea, debugfs is being disabled from more and more platforms over time. However, the functionality of dynamic debugging still is needed at times, and since it relies on debugfs for its user api, having debugfs disabled also forces dynamic debug to be disabled. To get around this, also create the "control" file for dynamic_debug in procfs. This allows people turn on debugging as needed at runtime for individual driverfs and subsystems. Reported-by: many different companies Cc: Jason Baron <jbaron@akamai.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200210211142.GB1373304@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

view details

Christian Brauner

commit sha 6df970e4f5d2c273554550d40d8b92cea9bec1a0

cgroup: unify attach permission checking The core codepaths to check whether a process can be attached to a cgroup are the same for threads and thread-group leaders. Only a small piece of code verifying that source and destination cgroup are in the same domain differentiates the thread permission checking from thread-group leader permission checking. Since cgroup_migrate_vet_dst() only matters cgroup2 - it is a noop on cgroup1 - we can move it out of cgroup_attach_task(). All checks can now be consolidated into a new helper cgroup_attach_permissions() callable from both cgroup_procs_write() and cgroup_threads_write(). Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: cgroups@vger.kernel.org Acked-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Christian Brauner

commit sha 17703097f3456498e6424614571648c6452f4d34

cgroup: add cgroup_get_from_file() helper Add a helper cgroup_get_from_file(). The helper will be used in subsequent patches to retrieve a cgroup while holding a reference to the struct file it was taken from. Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: cgroups@vger.kernel.org Acked-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Christian Brauner

commit sha 5a5cf5cb30d7815c01035fde4b84edef85d11c68

cgroup: refactor fork helpers This refactors the fork helpers so they can be easily modified in the next patches. The patch just moves the cgroup threadgroup rwsem grab and release into the helpers. They don't need to be directly exposed in fork.c. Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: cgroups@vger.kernel.org Acked-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Christian Brauner

commit sha f3553220d4cc458d69f7da6e71a3a6097778bd28

cgroup: add cgroup_may_write() helper Add a cgroup_may_write() helper which we can use in the CLONE_INTO_CGROUP patch series to verify that we can write to the destination cgroup. Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: cgroups@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Christian Brauner

commit sha ef2c41cf38a7559bbf91af42d5b6a4429db8fc68

clone3: allow spawning processes into cgroups This adds support for creating a process in a different cgroup than its parent. Callers can limit and account processes and threads right from the moment they are spawned: - A service manager can directly spawn new services into dedicated cgroups. - A process can be directly created in a frozen cgroup and will be frozen as well. - The initial accounting jitter experienced by process supervisors and daemons is eliminated with this. - Threaded applications or even thread implementations can choose to create a specific cgroup layout where each thread is spawned directly into a dedicated cgroup. This feature is limited to the unified hierarchy. Callers need to pass a directory file descriptor for the target cgroup. The caller can choose to pass an O_PATH file descriptor. All usual migration restrictions apply, i.e. there can be no processes in inner nodes. In general, creating a process directly in a target cgroup adheres to all migration restrictions. One of the biggest advantages of this feature is that CLONE_INTO_GROUP does not need to grab the write side of the cgroup cgroup_threadgroup_rwsem. This global lock makes moving tasks/threads around super expensive. With clone3() this lock is avoided. Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: cgroups@vger.kernel.org Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Christian Brauner

commit sha 9bd5910d7f3db2f65be139d2679dd9daa4a3419a

selftests/cgroup: add tests for cloning into cgroups Expand the cgroup test-suite to include tests for CLONE_INTO_CGROUP. This adds the following tests: - CLONE_INTO_CGROUP manages to clone a process directly into a correctly delegated cgroup - CLONE_INTO_CGROUP fails to clone a process into a cgroup that has been removed after we've opened an fd to it - CLONE_INTO_CGROUP fails to clone a process into an invalid domain cgroup - CLONE_INTO_CGROUP adheres to the no internal process constraint - CLONE_INTO_CGROUP works with the freezer feature Cc: Tejun Heo <tj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: cgroups@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Acked-by: Roman Gushchin <guro@fb.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Tejun Heo <tj@kernel.org>

view details

Pierre-Louis Bossart

commit sha 59528807715f81f123631f57446b08219efa7526

soundwire: stream: update state machine and add state checks The state machine and notes don't accurately explain or allow transitions from STREAM_DEPREPARED and STREAM_DISABLED. Add more explanations and allow for more transitions as a result of a trigger_stop(), trigger_suspend() and prepare(), depending on the ALSA/ASoC layer behavior defined by the INFO_RESUME and INFO_PAUSE flags. Also add basic checks to help debug inconsistent states and illegal state machine transitions. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200114235227.14502-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>

view details

Bard Liao

commit sha c32464c9393d0a426b5abbf01980ff5ecfb34a98

soundwire: stream: only prepare stream when it is configured. We don't need to prepare the stream again if the stream is already prepared. sdw_prepare_stream() could be called multiple times without calling sdw_deprepare_stream(). We call sdw_prepare_stream() in the prepare dai ops and sdw_deprepare_stream() in the hw_free dai ops. If an xrun happens, sdw_prepare_stream() will be called but sdw_deprepare_stream() will not, which results in an imbalance and an invalid total bandwidth. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20200114235227.14502-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>

view details

Pierre-Louis Bossart

commit sha c7a8f049b828dc8e01acd56911a1816b7725d9c3

soundwire: stream: do not update parameters during DISABLED-PREPARED transition After a system suspend, the ALSA/ASoC core will invoke the .prepare() callback and a TRIGGER_START when INFO_RESUME is not supported. Likewise, when an underflow occurs, the .prepare callback will be invoked. In both cases, the stream can be in DISABLED mode, and will transition into the PREPARED mode. We however don't want the bus bandwidth to be recomputed. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20200114235227.14502-4-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>

view details

Rander Wang

commit sha 60835022e196de1a4d73c249e99f34b7204ca267

soundwire: stream: fix support for multiple Slaves on the same link The existing code will unconditionally return after dealing with the first Slave on a link. This return should only happen when there is an error case. Tested on Comet Lake platform. Signed-off-by: Rander Wang <rander.wang@intel.com> Link: https://lore.kernel.org/r/20200114235227.14502-5-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>

view details

push time in 3 days

push eventtorvalds/linux

chenqiwu

commit sha acd624185d204b0ae2360b4648cc50802df5da01

dmaengine: ti: dma-crossbar: convert to devm_platform_ioremap_resource() Use a new API devm_platform_ioremap_resource() to simplify code. Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Link: https://lore.kernel.org/r/1580189746-2864-1-git-send-email-qiwuchen55@gmail.com Signed-off-by: Vinod Koul <vkoul@kernel.org>

view details

Jerome Brunet

commit sha 1dfa5a5ab34560fd9647083f623d19705be2e706

ASoC: core: allow a dt node to provide several components At the moment, querying the dai_name will stop of the first component matching the dt node. This does not allow a device (single dt node) to provide several ASoC components which could then be used through DT. This change let the search go on if the xlate function of the component returns an error, giving the possibility to another component to match and return the dai_name. Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20200213155159.3235792-2-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>

view details

Jerome Brunet

commit sha 9c29fd9bdf92900dc0cc5c2d8f58951a7bdc0f41

ASoC: meson: g12a: extract codec-to-codec utils The hdmi routing mechanism used on g12a hdmi is also used: * other Amlogic SoC types * for the internal DAC path Each of these codec glues are slightly different but the idea behind it remains the same. This change extract some helper functions from the g12a-tohdmitx driver to make them available for other Amlogic codecs. Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20200213155159.3235792-3-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>

view details

Jerome Brunet

commit sha 06b72824386795bf6f0a6ac0f0cfef6b7f0165c1

ASoC: meson: aiu: add audio output dt-bindings Add the dt-bindings and documentation of the AIU audio controller. This component provides most of the audio outputs found on the Amlogic Gx SoC family. Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20200213155159.3235792-4-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>

view details

Jerome Brunet

commit sha 6ae9ca9ce986bffe71fd0fbf9595de8500891b52

ASoC: meson: aiu: add i2s and spdif support Add support for the i2s and spdif audio outputs (AIU) found in the amlogic Gx SoC family Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20200213155159.3235792-5-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>

view details

Jerome Brunet

commit sha b82b734c0e9a75e1b956214ac523a8eb590f51f3

ASoC: meson: aiu: add hdmi codec control support Add the codec to codec component which handles the routing between the audio producers (PCM and I2S) and the synopsys hdmi controller on the amlogic GX SoC family Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20200213155159.3235792-6-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>

view details

Jerome Brunet

commit sha 65816025d46169973d308d83fbcf5c3981ed5621

ASoC: meson: aiu: add internal dac codec control support Add the codec to codec component which handles the routing between the audio producers and the internal audio DAC found on the amlogic GXL SoC family Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20200213155159.3235792-7-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>

view details

Jerome Brunet

commit sha aa9c3b7273a58b5d9b2c1161b76b5fc8ea8c159b

ASoC: meson: axg: extract sound card utils This prepares the addition of the GX SoC family sound card driver. The GX sound card, while slightly different, will be similar to the AXG one. The purpose of this change is to share the utils common to both sound card driver. Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20200213155159.3235792-8-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>

view details

Jerome Brunet

commit sha fd00366b8e4125d29e32d49053a702ddf77430f6

ASoC: meson: gx: add sound card dt-binding documentation Add the dt-binding documentation of sound card supporting the amlogic GX SoC family Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20200213155159.3235792-9-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>

view details

Jerome Brunet

commit sha e37a0c313a0f8ba0b8de9c30db98fbc77bd8d446

ASoC: meson: gx: add sound card support Add support for the sound card used on the amlogic GX SoC family Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20200213155159.3235792-10-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>

view details

Rob Herring

commit sha 67ccd2b97db276fed5ca4c38c166182327d2f401

of/address: Move range parser code out of CONFIG_PCI In preparation to make the range parsing code work for non-PCI buses, move the parsing functions out from the CONFIG_PCI #ifdef. Signed-off-by: Rob Herring <robh@kernel.org>

view details

Sricharan R

commit sha eed1015c4c4240780789556d8fbf71df68ce6508

dt-bindings: pinctrl: qcom: Add ipq6018 pinctrl bindings Add device tree binding Documentation details for ipq6018 pinctrl driver. Co-developed-by: Rajkumar Ayyasamy <arajkuma@codeaurora.org> Signed-off-by: Rajkumar Ayyasamy <arajkuma@codeaurora.org> Co-developed-by: Selvam Sathappan Periakaruppan <speriaka@codeaurora.org> Signed-off-by: Selvam Sathappan Periakaruppan <speriaka@codeaurora.org> Co-developed-by: Sivaprakash Murugesan <sivaprak@codeaurora.org> Signed-off-by: Sivaprakash Murugesan <sivaprak@codeaurora.org> Signed-off-by: Sricharan R <sricharan@codeaurora.org> Signed-off-by: Sivaprakash Murugesan <sivaprak@codeaurora.org> Link: https://lore.kernel.org/r/1579439601-14810-2-git-send-email-sricharan@codeaurora.org Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

Sricharan R

commit sha ef1ea54eab0ecb072700f59701387f839c8c760d

pinctrl: qcom: Add ipq6018 pinctrl driver Add initial pinctrl driver to support pin configuration with pinctrl framework for ipq6018. Co-developed-by: Rajkumar Ayyasamy <arajkuma@codeaurora.org> Signed-off-by: Rajkumar Ayyasamy <arajkuma@codeaurora.org> Co-developed-by: Selvam Sathappan Periakaruppan <speriaka@codeaurora.org> Signed-off-by: Selvam Sathappan Periakaruppan <speriaka@codeaurora.org> Co-developed-by: Sivaprakash Murugesan <sivaprak@codeaurora.org> Signed-off-by: Sivaprakash Murugesan <sivaprak@codeaurora.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Sricharan R <sricharan@codeaurora.org> Link: https://lore.kernel.org/r/1579439601-14810-3-git-send-email-sricharan@codeaurora.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

Light Hsieh

commit sha 3de7deefce693bb9783bca4cb42a81653ebec4e9

pinctrl: mediatek: Check gpio pin number and use binary search in mtk_hw_pin_field_lookup() 1. Check if gpio pin number is in valid range to prevent from get invalid pointer 'desc' in the following code: desc = (const struct mtk_pin_desc *)&hw->soc->pins[gpio]; 2. Improve mtk_hw_pin_field_lookup() 2.1 Modify mtk_hw_pin_field_lookup() to use binary search for accelerating search. 2.2 Correct message after the following check fail: if (hw->soc->reg_cal && hw->soc->reg_cal[field].range) { rc = &hw->soc->reg_cal[field]; The original message is: "Not support field %d for pin %d (%s)\n" However, the check is on soc chip level, not on pin level yet. So the message is corrected as: "Not support field %d for this soc\n" Signed-off-by: Light Hsieh <light.hsieh@mediatek.com> Link: https://lore.kernel.org/r/1579675994-7001-1-git-send-email-light.hsieh@mediatek.com Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

Light Hsieh

commit sha 5f755e1f1efe5ca3b475b14169e6e85bf1411bb5

pinctrl: mediatek: Supporting driving setting without mapping current to register value MediaTek's smartphone project actual usage does need to know current value (in mA) in procedure of finding the best driving setting. The steps in the procedure is like as follow: 1. set driving setting field in setting register as 0, measure waveform, perform test, and etc. 2. set driving setting field in setting register as 1, measure waveform, perform test, and etc. ... n. set driving setting field in setting register as n-1, measure waveform, perform test, and etc. Check the results of steps 1~n and adopt the setting that get best result. This procedure does need to know the mapping between current to register value. Therefore, setting driving without mapping current is more practical for MediaTek's smartphone usage. Signed-off-by: Light Hsieh <light.hsieh@mediatek.com> Link: https://lore.kernel.org/r/1579675994-7001-2-git-send-email-light.hsieh@mediatek.com Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

Light Hsieh

commit sha 3599cc525486be6681640ff3083376c001264c61

pinctrl: mediatek: Refine mtk_pinconf_get() and mtk_pinconf_set() 1.Refine mtk_pinconf_get(): Use only one occurrence of return at end of this function. 2.Refine mtk_pinconf_set(): 2.1 Use only one occurrence of return at end of this function. 2.2 Modify case of PIN_CONFIG_INPUT_ENABLE - 2.2.1 Regard all non-zero setting value as enable, instead of always enable. 2.2.2 Remove check of ies_present flag and always invoke mtk_hw_set_value() since mtk_hw_pin_field_lookup() invoked inside mtk_hw_set_value() has the same effect of checking if ies control is supported. [The rationale is that: available of a control is always checked in mtk_hw_pin_field_lookup() and no need to add ies_present flag specially for ies control.] 2.3 Simply code logic for case of PIN_CONFIG_INPUT_SCHMITT. 2.4 Add case for PIN_CONFIG_INPUT_SCHMITT_ENABLE and process it with the same code for case of PIN_CONFIG_INPUT_SCHMITT. Signed-off-by: Light Hsieh <light.hsieh@mediatek.com> Link: https://lore.kernel.org/r/1579675994-7001-3-git-send-email-light.hsieh@mediatek.com Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

Light Hsieh

commit sha 1bea6afbc84206cd939ae227cf81d6c824af6fd7

pinctrl: mediatek: Refine mtk_pinconf_get() Correct cases for PIN_CONFIG_SLEW_RATE, PIN_CONFIG_INPUT_SCHMITT_ENABLE, and PIN_CONFIG_OUTPUT_ENABLE - Use variable ret to receive value in mtk_hw_get_value() (instead of variable val) since pinconf_to_config_packed() at end of this function use variable ret to pack config value. Signed-off-by: Light Hsieh <light.hsieh@mediatek.com> Link: https://lore.kernel.org/r/1579675994-7001-4-git-send-email-light.hsieh@mediatek.com Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

Light Hsieh

commit sha cafe19db7751269bf6b4dd2148cbfa9fbe91d651

pinctrl: mediatek: Backward compatible to previous Mediatek's bias-pull usage Refine mtk_pinconf_set()/mtk_pinconf_get() for backward compatibility to previous MediaTek's bias-pull usage. In PINCTRL_MTK that use pinctrl-mtk-common.c, bias-pull setting for pins with 2 pull resistors can be specified as value for bias-pull-up and bias-pull-down. For example: bias-pull-up = <MTK_PUPD_SET_R1R0_00>; bias-pull-up = <MTK_PUPD_SET_R1R0_01>; bias-pull-up = <MTK_PUPD_SET_R1R0_10>; bias-pull-up = <MTK_PUPD_SET_R1R0_11>; bias-pull-down = <MTK_PUPD_SET_R1R0_00>; bias-pull-down = <MTK_PUPD_SET_R1R0_01>; bias-pull-down = <MTK_PUPD_SET_R1R0_10>; bias-pull-down = <MTK_PUPD_SET_R1R0_11>; On the other hand, PINCTRL_MTK_PARIS use customized properties "mediatek,pull-up-adv" and "mediatek,pull-down-adv" to specify bias-pull setting for pins with 2 pull resistors. This introduce in-compatibility in device tree and increase porting effort to MediaTek's customer that had already used PINCTRL_MTK version. Besides, if customers are not aware of this change and still write devicetree for PINCTRL_MTK version, they may encounter runtime failure with pinctrl and spent time to debug. This patch adds backward compatible to previous MediaTek's bias-pull usage so that Mediatek's customer need not use a new devicetree property name. The rationale is that: changing driver implementation had better leave interface unchanged. Signed-off-by: Light Hsieh <light.hsieh@mediatek.com> Link: https://lore.kernel.org/r/1579675994-7001-5-git-send-email-light.hsieh@mediatek.com Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

Light Hsieh

commit sha 184d8e13f9b13313f711f028ca2465f973459046

pinctrl: mediatek: Add support for pin configuration dump via debugfs. Add support for pin configuration dump via catting /sys/kernel/debug/pinctrl/$platform_dependent_path/pinconf-pins. pinctrl framework had already support such dump. This patch implement the operation function pointer to fullfill this dump. Signed-off-by: Light Hsieh <light.hsieh@mediatek.com> Link: https://lore.kernel.org/r/1579675994-7001-6-git-send-email-light.hsieh@mediatek.com Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

Matheus Castello

commit sha 6f87359e8bcaf88381b9c9c038929e0e6872d308

pinctrl: actions: Fix functions groups names for S700 SoC Group names by function do not match their respective structures and documentation defined names. This fixes following errors when groups names defined on documentation are used: [ 4.262778] pinctrl-s700 e01b0000.pinctrl: invalid group "sd0_d1_mfp" for function "sd0" [ 4.271394] pinctrl-s700 e01b0000.pinctrl: invalid group "sd0_d2_d3_mfp" for function "sd0" [ 4.280248] pinctrl-s700 e01b0000.pinctrl: invalid group "sd1_d0_d3_mfp" for function "sd0" [ 4.289122] pinctrl-s700 e01b0000.pinctrl: invalid group "sd0_cmd_mfp" for function "sd0" Fixes: 81c9d563cc74 (pinctrl: actions: Add Actions Semi S700 pinctrl driver) Signed-off-by: Matheus Castello <matheus@castello.eng.br> Link: https://lore.kernel.org/r/20200124133758.10089-1-matheus@castello.eng.br Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

view details

push time in 4 days

push eventtorvalds/linux

Andrew Jones

commit sha f09ab268bbb26d5d851636801cafe73456ff73ab

KVM: selftests: aarch64: Use stream when given I'm not sure how we ended up using printf instead of fprintf in virt_dump(). Fix it. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

view details

Andrew Jones

commit sha 10d1a71b164e497587fd4d0bbf2537568d9f01d9

KVM: selftests: Remove unnecessary defines BITS_PER_LONG and friends are provided by linux/bitops.h Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

view details

Andrew Jones

commit sha 12c0d0f6d9df3a657817d639d236f3a9755640e4

KVM: selftests: aarch64: Remove unnecessary ifdefs Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

view details

Andrew Jones

commit sha f832485df2d46a43c2ef10be2676e3b5b5c7e7bb

KVM: selftests: Rename vm_guest_mode_params We're going to want this name in the library code, so use a shorter name in the tests. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

view details

Andrew Jones

commit sha 377a41c9ef84181bff5a3af2da9dfd21d6a08911

KVM: selftests: Introduce vm_guest_mode_params This array will allow us to easily translate modes to their parameter values. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

view details

Andrew Jones

commit sha 87a802d93e7ef55216d8884fdf7e5f491a6fe501

KVM: selftests: Introduce num-pages conversion utilities Guests and hosts don't have to have the same page size. This means calculations are necessary when selecting the number of guest pages to allocate in order to ensure the number is compatible with the host. Provide utilities to help with those calculations and apply them where appropriate. We also revert commit bffed38d4fb5 ("kvm: selftests: aarch64: dirty_log_test: fix unaligned memslot size") and then use vm_adjust_num_guest_pages() there instead. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

view details

Ben Gardon

commit sha 025eed7b3519be30cc2310711137ab4ff827fbe3

KVM: selftests: Create a demand paging test While userfaultfd, KVM's demand paging implementation, is not specific to KVM, having a benchmark for its performance will be useful for guiding performance improvements to KVM. As a first step towards creating a userfaultfd demand paging test, create a simple memory access test, based on dirty_log_test. Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

view details

Vasily Gorbik

commit sha ecdc5d842bb3c166c3d549e52ba91a3955b257f2

s390/protvirt: introduce host side setup Add "prot_virt" command line option which controls if the kernel protected VMs support is enabled at early boot time. This has to be done early, because it needs large amounts of memory and will disable some features like STP time sync for the lpar. Extend ultravisor info definitions and expose it via uv_info struct filled in during startup. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Vasily Gorbik

commit sha 29d37e5b82f3e96dd648167657d5a0e0111ce877

s390/protvirt: add ultravisor initialization Before being able to host protected virtual machines, donate some of the memory to the ultravisor. Besides that the ultravisor might impose addressing limitations for memory used to back protected VM storage. Treat that limit as protected virtualization host's virtual memory limit. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Claudio Imbrenda

commit sha 214d9bbcd3a67230b932f6cea83c078ab34d9e70

s390/mm: provide memory management functions for protected KVM guests This provides the basic ultravisor calls and page table handling to cope with secure guests: - provide arch_make_page_accessible - make pages accessible after unmapping of secure guests - provide the ultravisor commands convert to/from secure - provide the ultravisor commands pin/unpin shared - provide callbacks to make pages secure (inacccessible) - we check for the expected pin count to only make pages secure if the host is not accessing them - we fence hugetlbfs for secure pages - add missing radix-tree include into gmap.h The basic idea is that a page can have 3 states: secure, normal or shared. The hypervisor can call into a firmware function called ultravisor that allows to change the state of a page: convert from/to secure. The convert from secure will encrypt the page and make it available to the host and host I/O. The convert to secure will remove the host capability to access this page. The design is that on convert to secure we will wait until writeback and page refs are indicating no host usage. At the same time the convert from secure (export to host) will be called in common code when the refcount or the writeback bit is already set. This avoids races between convert from and to secure. Then there is also the concept of shared pages. Those are kind of secure where the host can still access those pages. We need to be notified when the guest "unshares" such a page, basically doing a convert to secure by then. There is a call "pin shared page" that we use instead of convert from secure when possible. We do use PG_arch_1 as an optimization to minimize the convert from secure/pin shared. Several comments have been added in the code to explain the logic in the relevant places. Co-developed-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Vasily Gorbik

commit sha 084ea4d611a3d00ee3930400b262240e10895900

s390/mm: add (non)secure page access exceptions handlers Add exceptions handlers performing transparent transition of non-secure pages to secure (import) upon guest access and secure pages to non-secure (export) upon hypervisor access. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> [frankja@linux.ibm.com: adding checks for failures] Signed-off-by: Janosch Frank <frankja@linux.ibm.com> [imbrenda@linux.ibm.com: adding a check for gmap fault] Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Janosch Frank

commit sha a0f60f8431999bf57cf53c3b27c47ef156b4fa17

s390/protvirt: Add sysfs firmware interface for Ultravisor information That information, e.g. the maximum number of guests or installed Ultravisor facilities, is interesting for QEMU, Libvirt and administrators. Let's provide an easily parsable API to get that information. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Christian Borntraeger

commit sha f15587c83460e4be92f0a5d7ae73e26eab13254d

Merge branch 'pvbase' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD s390 base parts (non kvm) for protvirt

view details

Ulrich Weigand

commit sha f65470661f3648fe6d3d13475d01a744bb14f8b4

KVM: s390/interrupt: do not pin adapter interrupt pages The adapter interrupt page containing the indicator bits is currently pinned. That means that a guest with many devices can pin a lot of memory pages in the host. This also complicates the reference tracking which is needed for memory management handling of protected virtual machines. It might also have some strange side effects for madvise MADV_DONTNEED and other things. We can simply try to get the userspace page set the bits and free the page. By storing the userspace address in the irq routing entry instead of the guest address we can actually avoid many lookups and list walks so that this variant is very likely not slower. If userspace messes around with the memory slots the worst thing that can happen is that we write to some other memory within that process. As we get the the page with FOLL_WRITE this can also not be used to write to shared read-only pages. Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [borntraeger@de.ibm.com: patch simplification] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Janosch Frank

commit sha 3e6c556899d02e04d3d65f0e12adfbe05a557832

KVM: s390: protvirt: Add UV debug trace Let's have some debug traces which stay around for longer than the guest. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Janosch Frank

commit sha 6933316fe011d5875b360ea8b7404a6612846740

KVM: s390: add new variants of UV CALL This adds two new helper functions for doing UV CALLs. The first variant handles UV CALLs that might have longer busy conditions or just need longer when doing partial completion. We should schedule when necessary. The second variant handles UV CALLs that only need the handle but have no payload (e.g. destroying a VM). We can provide a simple wrapper for those. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Janosch Frank

commit sha 29b40f105ec8d555984c1f72dc9133b122e51903

KVM: s390: protvirt: Add initial vm and cpu lifecycle handling This contains 3 main changes: 1. changes in SIE control block handling for secure guests 2. helper functions for create/destroy/unpack secure guests 3. KVM_S390_PV_COMMAND ioctl to allow userspace dealing with secure machines Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Janosch Frank

commit sha fa0c5eabbdd33012b369cf75d6a39389cc9ae707

KVM: s390: protvirt: Secure memory is not mergeable KSM will not work on secure pages, because when the kernel reads a secure page, it will be encrypted and hence no two pages will look the same. Let's mark the guest pages as unmergeable when we transition to secure mode. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

Christian Borntraeger

commit sha 1274800792dced8e5b6d54c71ec049c4d1e34189

KVM: s390/mm: Make pages accessible before destroying the guest Before we destroy the secure configuration, we better make all pages accessible again. This also happens during reboot, where we reboot into a non-secure guest that then can go again into secure mode. As this "new" secure guest will have a new ID we cannot reuse the old page state. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>

view details

Janosch Frank

commit sha 49710db081699a14f4ba49cb0c02d0571a341449

KVM: s390: protvirt: Handle SE notification interceptions Since there is no interception for load control and load psw instruction in the protected mode, we need a new way to get notified whenever we can inject an IRQ right after the guest has just enabled the possibility for receiving them. The new interception codes solve that problem by providing a notification for changes to IRQ enablement relevant bits in CRs 0, 6 and 14, as well a the machine check mask bit in the PSW. No special handling is needed for these interception codes, the KVM pre-run code will consult all necessary CRs and PSW bits and inject IRQs the guest is enabled for. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

view details

push time in 4 days

push eventtorvalds/linux

Al Viro

commit sha 8f11538ebe984e5434eeda4c7183d165cddb5936

do_add_mount(): lift lock_mount/unlock_mount into callers preparation to finish_automount() fix (next commit) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Eric W. Biederman

commit sha 0afa5ca82212247456f9de1468b595a111fee633

proc: Rename in proc_inode rename sysctl_inodes sibling_inodes I about to need and use the same functionality for pid based inodes and there is no point in adding a second field when this field is already here and serving the same purporse. Just give the field a generic name so it is clear that it is no longer sysctl specific. Also for good measure initialize sibling_inodes when proc_inode is initialized. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

view details

Eric W. Biederman

commit sha 26dbc60f385ff9cff475ea2a3bad02e80fd6fa43

proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache This prepares the way for allowing the pid part of proc to use this dcache pruning code as well. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

view details

Eric W. Biederman

commit sha 080f6276fccfb3923691e57c1b44a627eabd1a25

proc: In proc_prune_siblings_dcache cache an aquired super block Because there are likely to be several sysctls in a row on the same superblock cache the super_block after the count has been raised and don't deactivate it until we are processing another super_block. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

view details

Eric W. Biederman

commit sha f90f3cafe8d56d593fc509a4185da1d5800efea4

proc: Use d_invalidate in proc_prune_siblings_dcache The function d_prune_aliases has the problem that it will only prune aliases thare are completely unused. It will not remove aliases for the dcache or even think of removing mounts from the dcache. For that behavior d_invalidate is needed. To use d_invalidate replace d_prune_aliases with d_find_alias followed by d_invalidate and dput. For completeness the directory and the non-directory cases are separated because in theory (although not in currently in practice for proc) directories can only ever have a single dentry while non-directories can have hardlinks and thus multiple dentries. As part of this separation use d_find_any_alias for directories to spare d_find_alias the extra work of doing that. Plus the differences between d_find_any_alias and d_find_alias makes it clear why the directory and non-directory code and not share code. To make it clear these routines now invalidate dentries rename proc_prune_siblings_dache to proc_invalidate_siblings_dcache, and rename proc_sys_prune_dcache proc_sys_invalidate_dcache. V2: Split the directory and non-directory cases. To make this code robust to future changes in proc. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

view details

Eric W. Biederman

commit sha 71448011ea2a1cd36d8f5cbdab0ed716c454d565

proc: Clear the pieces of proc_inode that proc_evict_inode cares about This just keeps everything tidier, and allows for using flags like SLAB_TYPESAFE_BY_RCU where slabs are not always cleared before reuse. I don't see reuse without reinitializing happening with the proc_inode but I had a false alarm while reworking flushing of proc dentries and indoes when a process dies that caused me to tidy this up. The code is a little easier to follow and reason about this way so I figured the changes might as well be kept. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

view details

Eric W. Biederman

commit sha 7bc3e6e55acf065500a24621f3b313e7e5998acf

proc: Use a list of inodes to flush from proc Rework the flushing of proc to use a list of directory inodes that need to be flushed. The list is kept on struct pid not on struct task_struct, as there is a fixed connection between proc inodes and pids but at least for the case of de_thread the pid of a task_struct changes. This removes the dependency on proc_mnt which allows for different mounts of proc having different mount options even in the same pid namespace and this allows for the removal of proc_mnt which will trivially the first mount of proc to honor it's mount options. This flushing remains an optimization. The functions pid_delete_dentry and pid_revalidate ensure that ordinary dcache management will not attempt to use dentries past the point their respective task has died. When unused the shrinker will eventually be able to remove these dentries. There is a case in de_thread where proc_flush_pid can be called early for a given pid. Which winds up being safe (if suboptimal) as this is just an optiimization. Only pid directories are put on the list as the other per pid files are children of those directories and d_invalidate on the directory will get them as well. So that the pid can be used during flushing it's reference count is taken in release_task and dropped in proc_flush_pid. Further the call of proc_flush_pid is moved after the tasklist_lock is released in release_task so that it is certain that the pid has already been unhashed when flushing it taking place. This removes a small race where a dentry could recreated. As struct pid is supposed to be small and I need a per pid lock I reuse the only lock that currently exists in struct pid the the wait_pidfd.lock. The net result is that this adds all of this functionality with just a little extra list management overhead and a single extra pointer in struct pid. v2: Initialize pid->inodes. I somehow failed to get that initialization into the initial version of the patch. A boot failure was reported by "kernel test robot <lkp@intel.com>", and failure to initialize that pid->inodes matches all of the reported symptoms. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

view details

Al Viro

commit sha 26df6034fdb211857e069e7b07d25a368da646df

fix automount/automount race properly Protection against automount/automount races (two threads hitting the same referral point at the same time) is based upon do_add_mount() prevention of identical overmounts - trying to overmount the root of mounted tree with the same tree fails with -EBUSY. It's unreliable (the other thread might've mounted something on top of the automount it has triggered) *and* causes no end of headache for follow_automount() and its caller, since finish_automount() behaves like do_new_mount() - if the mountpoint to be is overmounted, it mounts on top what's overmounting it. It's not only wrong (we want to go into what's overmounting the automount point and quietly discard what we planned to mount there), it introduces the possibility of original parent mount getting dropped. That's what 8aef18845266 (VFS: Fix vfsmount overput on simultaneous automount) deals with, but it can't do anything about the reliability of conflict detection - if something had been overmounted the other thread's automount (e.g. that other thread having stepped into automount in mount(2)), we don't get that -EBUSY and the result is referral point under automounted NFS under explicit overmount under another copy of automounted NFS What we need is finish_automount() *NOT* digging into overmounts - if it finds one, it should just quietly discard the thing it was asked to mount. And don't bother with actually crossing into the results of finish_automount() - the same loop that calls follow_automount() will do that just fine on the next iteration. IOW, instead of calling lock_mount() have finish_automount() do it manually, _without_ the "move into overmount and retry" part. And leave crossing into the results to the caller of follow_automount(), which simplifies it a lot. Moral: if you end up with a lot of glue working around the calling conventions of something, perhaps these calling conventions are simply wrong... Fixes: 8aef18845266 (VFS: Fix vfsmount overput on simultaneous automount) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Al Viro

commit sha 25e195aa1e607f129ab912d29fcfc79239703307

follow_automount(): get rid of dead^Wstillborn code 1) no instances of ->d_automount() have ever made use of the "return ERR_PTR(-EISDIR) if you don't feel like mounting anything" - that's a rudiment of plans that got superseded before the thing went into the tree. Despite the comment in follow_automount(), autofs has never done that. 2) if there's no ->d_automount() in dentry_operations, filesystems should not set DCACHE_NEED_AUTOMOUNT in the first place. None have ever done so... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Al Viro

commit sha 1c9f5e06a613cc48608db1a4207b8bd5ebe70a81

follow_automount() doesn't need the entire nameidata Only the address of ->total_link_count and the flags. And fix an off-by-one is ELOOP detection - make it consistent with symlink following, where we check if the pre-increment value has reached 40, rather than check the post-increment one. [kudos to Christian Brauner for spotted braino] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Al Viro

commit sha 31d1726d7250021c66c9f16d8a128444676db782

make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW O_CREAT | O_EXCL means "-EEXIST if we run into a trailing symlink". As it is, we might or might not have LOOKUP_FOLLOW in op->intent in that case - that depends upon having O_NOFOLLOW in open flags. It doesn't matter, since we won't be checking it in that case - do_last() bails out earlier. However, making sure it's not set (i.e. acting as if we had an explicit O_NOFOLLOW) makes the behaviour more explicit and allows to reorder the check for O_CREAT | O_EXCL in do_last() with the call of step_into() immediately following it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Al Viro

commit sha bd7c4b508344680c843e2d2436d56b9fc8aedc9d

handle_mounts(): start building a sane wrapper for follow_managed() All callers of follow_managed() follow it on success with the same steps - d_backing_inode(path->dentry) is calculated and stored into some struct inode * variable and, in all but one case, an unsigned variable (nd->seq to be) is zeroed. The single exception is lookup_fast() and there zeroing is correct thing to do - not doing it is a pointless microoptimization. Add a wrapper for follow_managed() that would do that combination. It's mostly a vehicle for code massage - it will be changing quite a bit, and the current calling conventions are by no means final. Right now it takes path, nameidata and (as out params) inode and seq, similar to __follow_mount_rcu(). Which will soon get folded into it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Al Viro

commit sha 239eb983383b1b140e588fe8f1c7d1ce953285f6

atomic_open(): saner calling conventions (return dentry on success) Currently it either returns -E... or puts (nd->path.mnt,dentry) into *path and returns 0. Make it return ERR_PTR(-E...) or dentry; adjust the caller. Fewer arguments and it's easier to keep track of *path contents that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Eric W. Biederman

commit sha a13ae6971599dd01a5fa8da9ee1bd5bb3efa01b3

proc: Dentry flushing without proc_mnt Cleanly handling proc mount options require the internal mount of proc to be removed (so mount options are not ignored), and quite possibly multiple proc superblocks per pid namespace (so a second mount of proc does not silently get the mount options of the first mount of proc. In either case being able to flush proc dentries on process exit needs to be made to work without going through proc_mnt. After serveral discussions this is the set of changes that work and no one objects to. --- I have addressed all of the review comments as I understand them, and fixed the small oversight the kernel test robot was able to find. (I had failed to initialize the new field pid->inodes). I did not hear any concerns from the 10,000 foot level last time so I am assuming this set of changes (baring bugs) is good to go. Unless some new issues appear my plan is to put this in my tree and get this into linux-next. Which will give Alexey something to build his changes on. I tested this set of changes by running: (while ls -1 -f /proc > /dev/null ; do :; done ) & And monitoring the amount of free memory. With the flushing disabled I saw the used memory in the system grow by 20M before the shrinker would bring it back down to where it started. With the patch applied I saw the memory usage stay essentially fixed. So flushing definitely keeps things working better. Eric W. Biederman (6): proc: Rename in proc_inode rename sysctl_inodes sibling_inodes proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache proc: In proc_prune_siblings_dcache cache an aquired super block proc: Use d_invalidate in proc_prune_siblings_dcache proc: Clear the pieces of proc_inode that proc_evict_inode cares about proc: Use a list of inodes to flush from proc fs/proc/base.c | 111 ++++++++++++++++-------------------------------- fs/proc/inode.c | 73 ++++++++++++++++++++++++++++--- fs/proc/internal.h | 4 +- fs/proc/proc_sysctl.c | 45 +++----------------- include/linux/pid.h | 1 + include/linux/proc_fs.h | 4 +- kernel/exit.c | 4 +- kernel/pid.c | 1 + 8 files changed, 120 insertions(+), 123 deletions(-) Link: https://lore.kernel.org/lkml/871rqk2brn.fsf_-_@x220.int.ebiederm.org/ Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Merge branch 'proc-dentry-flushing-without-proc-mnt-v2' into HEAD

view details

Eric W. Biederman

commit sha af1abab986b85fdeb3e98c037988cfa4bf9d4018

uml: Don't consult current to find the proc_mnt in mconsole_proc Inspection of the control flow reveals that mconsole_proc is either called from mconsole_stop called from mc_work_proc or from mc_work_proc directly. The function mc_work_proc is dispatched to a kernel thread with schedule_work. All of the threads that run dispatched by schedule_work are in the init pid namespace. So make the code clearer and by using init_pid_ns instead of task_active_pid_ns(current). Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

view details

Eric W. Biederman

commit sha 76313c70c52f930af4afd21684509ca52297ea71

uml: Create a private mount of proc for mconsole The mconsole code only ever accesses proc for the initial pid namespace. Instead of depending upon the proc_mnt which is for proc_flush_task have uml create it's own mount of proc instead. This allows proc_flush_task to evolve and remove the need for having a proc_mnt to do it's job. Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

view details

Eric W. Biederman

commit sha 69879c01a0c3f70e0887cfb4d9ff439814361e46

proc: Remove the now unnecessary internal mount of proc There remains no more code in the kernel using pids_ns->proc_mnt, therefore remove it from the kernel. The big benefit of this change is that one of the most error prone and tricky parts of the pid namespace implementation, maintaining kernel mounts of proc is removed. In addition removing the unnecessary complexity of the kernel mount fixes a regression that caused the proc mount options to be ignored. Now that the initial mount of proc comes from userspace, those mount options are again honored. This fixes Android's usage of the proc hidepid option. Reported-by: Alistair Strachan <astrachan@google.com> Fixes: e94591d0d90c ("proc: Convert proc_mount to use mount_ns.") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

view details

Eric W. Biederman

commit sha af9fe6d607c9f456fb6c1cb87e1dc66a43846efd

pid: Improve the comment about waiting in zap_pid_ns_processes Oleg wrote a very informative comment, but with the removal of proc_cleanup_work it is no longer accurate. Rewrite the comment so that it only talks about the details that are still relevant, and hopefully is a little clearer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

view details

Eric W. Biederman

commit sha a0d4a141750df51135499f96c355c4d76add5505

Proc mount option handling is broken, and it has been since I accidentally broke it in the middle 2016. The problem is that because we perform an internal mount of proc before user space mounts proc all of the mount options that user specifies when mounting proc are ignored. You can set those mount options with a remount but that is rather surprising. This most directly affects android which is using hidpid=2 by default. Now that the sysctl system call support has been removed, and we have settled on way of flushing proc dentries when a process exits without using proc_mnt, there is an simple and easy fix. a) Give UML mconsole it's own private mount of proc to use. b) Stop creating the internal mount of proc We still need Alexey Gladkov's full patch to get proc mount options to work inside of UML, and to be generally useful. This set of changes is just enough to get them working as well as they have in the past. If anyone sees any problem with this code please let me know. Otherwise I plan to merge these set of fixes through my tree. Link: https://lore.kernel.org/lkml/87r21tuulj.fsf@x220.int.ebiederm.org/ Link: https://lore.kernel.org/lkml/871rqk2brn.fsf_-_@x220.int.ebiederm.org/ Link: https://lore.kernel.org/lkml/20200210150519.538333-1-gladkov.alexey@gmail.com/ Link: https://lore.kernel.org/lkml/20180611195744.154962-1-astrachan@google.com/ Fixes: e94591d0d90c ("proc: Convert proc_mount to use mount_ns.") Eric W. Biederman (4): uml: Don't consult current to find the proc_mnt in mconsole_proc uml: Create a private mount of proc for mconsole proc: Remove the now unnecessary internal mount of proc pid: Improve the comment about waiting in zap_pid_ns_processes arch/um/drivers/mconsole_kern.c | 28 +++++++++++++++++++++++++++- fs/proc/root.c | 36 ------------------------------------ include/linux/pid_namespace.h | 2 -- include/linux/proc_ns.h | 5 ----- kernel/pid.c | 8 -------- kernel/pid_namespace.c | 38 +++++++++++++++++++------------------- 6 files changed, 46 insertions(+), 71 deletions(-)

view details

Brian Foster

commit sha 6b789c337a5963ae57cbc7fe9e41488c40a9b014

xfs: fix iclog release error check race with shutdown Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with l_icloglock held"), xlog_state_release_iclog() always performed a locked check of the iclog error state before proceeding into the sync state processing code. As of this commit, part of xlog_state_release_iclog() was open-coded into xfs_log_release_iclog() and as a result the locked error state check was lost. The lockless check still exists, but this doesn't account for the possibility of a race with a shutdown being performed by another task causing the iclog state to change while the original task waits on ->l_icloglock. This has reproduced very rarely via generic/475 and manifests as an assert failure in __xlog_state_release_iclog() due to an unexpected iclog state. Restore the locked error state check in xlog_state_release_iclog() to ensure that an iclog state update via shutdown doesn't race with the iclog release state processing code. Fixes: df732b29c807 ("xfs: call xlog_state_release_iclog with l_icloglock held") Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

view details

push time in 4 days

push eventtorvalds/linux

Matthew Wilcox (Oracle)

commit sha bd40b17ca49d7d110adf456e647701ce74de2241

XArray: Fix xa_find_next for large multi-index entries Coverity pointed out that xas_sibling() was shifting xa_offset without promoting it to an unsigned long first, so the shift could cause an overflow and we'd get the wrong answer. The fix is obvious, and the new test-case provokes UBSAN to report an error: runtime error: shift exponent 60 is too large for 32-bit type 'int' Fixes: 19c30f4dd092 ("XArray: Fix xa_find_after with multi-index entries") Reported-by: Bjorn Helgaas <bhelgaas@google.com> Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: stable@vger.kernel.org

view details

Matthew Wilcox (Oracle)

commit sha c36d451ad386b34f452fc3c8621ff14b9eaa31a6

XArray: Fix xas_pause for large multi-index entries Inspired by the recent Coverity report, I looked for other places where the offset wasn't being converted to an unsigned long before being shifted, and I found one in xas_pause() when the entry being paused is of order >32. Fixes: b803b42823d0 ("xarray: Add XArray iterators") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: stable@vger.kernel.org

view details

Chengguang Xu

commit sha 24a448b165253b6f2ab1e0bcdba9a733007681d6

XArray: Fix incorrect comment in header file 256 means Retry entry and 257 means Zero entry, so fix it. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

view details

Alex Shi

commit sha 3a00e7c47c382b30524e78b36ab047c16b8fcfef

ida: remove abandoned macros 3 IDA_ started macros aren't used from commit f32f004cddf8 ("ida: Convert to XArray"). so better to remove them. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

view details

Parav Pandit

commit sha 081ea5195a11c9f1eaa8393be603b75982f91b7d

RDMA/cma: Use a helper function to enqueue resolve work items To avoid errors, with attaching ownership of work item and its cm_id refcount which is decremented in work handler, tie them up in single helper function. Also avoid code duplication. Link: https://lore.kernel.org/r/20200126142652.104803-3-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Parav Pandit

commit sha cc055dd3a71352759a6c7ecaee612eeaef93ef22

RDMA/cma: Use RDMA device port iterator Use RDMA device port iterator to avoid open coding. Link: https://lore.kernel.org/r/20200126142652.104803-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Parav Pandit

commit sha 5ff8c8fa44c2cb74f3066ec4a531265db69b86c5

RDMA/cma: Rename cma_device ref/deref helpers to to get/put Helper functions which increment/decrement reference count of the structure read better when they are named with the get/put suffix. Hence, rename cma_ref/deref_dev() to cma_dev_get/put(). Link: https://lore.kernel.org/r/20200126142652.104803-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Parav Pandit

commit sha be439912e7c2e3e78ebd087932c165a83bdca6b5

RDMA/cma: Use refcount API to reflect refcount Use the refcount variant to capture the reference counting of the cma device structure. Link: https://lore.kernel.org/r/20200126142652.104803-6-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Parav Pandit

commit sha e368d23f57f6a08341d35c44255f2d8e7695152b

RDMA/cma: Rename cma_device ref/deref helpers to to get/put Helper functions which increment/decrement reference count of a structure read better when they are named with the get/put suffix. Hence, rename cma_ref/deref_id() to cma_id_get/put(). Also use cma_get_id() wrapper to find the balancing put() calls. Link: https://lore.kernel.org/r/20200126142652.104803-7-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Parav Pandit

commit sha 43fb5892cdfaa3bbe170aade07d4a38086636cca

RDMA/cma: Use refcount API to reflect refcount Use a refcount_t for atomics being used as a refcount. Link: https://lore.kernel.org/r/20200126142652.104803-8-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Lang Cheng

commit sha b14c95bee83537aff423cece4ae6078759bc6f34

RDMA/hns: Cleanups of magic numbers Some magic numbers are hard to understand, so replace them with macros or add some comments for them. Link: https://lore.kernel.org/r/20200126145504.9700-1-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Xi Wang

commit sha d7e2d3432ae7795e5b77a77039e904ed4a769f39

RDMA/hns: Optimize eqe buffer allocation flow The eqe has a private multi-hop addressing implementation, but there is already a set of interfaces in the hns driver that can achieve this. So, simplify the eqe buffer allocation process by using the mtr interface and remove large amount of repeated logic. Link: https://lore.kernel.org/r/20200126145835.11368-1-liweihang@huawei.com Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Shiraz Saleem

commit sha 9a4b24108d92ffaa886e37923088bd806b988948

i40iw: Do an RCU lookup in i40iw_add_ipv4_addr The in_dev_for_each_ifa_rtnl() iterator in i40iw_add_ipv4_addr requires that the rtnl lock be held. But the rtnl_trylock/unlock scheme in this function does not guarantee it. Replace the rtnl locking with an RCU lookup using in_dev_for_each_ifa_rcu() Fixes: 8e06af711bf2 ("i40iw: add main, hdr, status") Link: https://lore.kernel.org/r/20200204223840.2151-1-shiraz.saleem@intel.com Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Kamal Heib

commit sha beb205dd67aaa4315dedf5c40b47c6e9dee5a469

RDMA/siw: Fix setting active_mtu attribute Make sure to set the active_mtu attribute to avoid report the following invalid value: $ ibv_devinfo -d siw0 | grep active_mtu active_mtu: invalid MTU (0) Fixes: 303ae1cdfdf7 ("rdma/siw: application interface") Link: https://lore.kernel.org/r/20200205081354.30438-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Leon Romanovsky

commit sha ca750d4a9c426854e221ca404fbefeab9800134e

RDMA/ucma: Mask QPN to be 24 bits according to IBTA IBTA declares QPN as 24bits, mask input to ensure that kernel doesn't get higher bits and ensure by adding WANR_ONCE() that other CM users do the same. Link: https://lore.kernel.org/r/20200212072635.682689-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Michael Guralnik

commit sha f03d9fadfe13a78ee28fec320d43f7b37574adcb

RDMA/core: Add weak ordering dma attr to dma mapping For memory regions registered with IB_ACCESS_RELAXED_ORDERING will be dma mapped with the DMA_ATTR_WEAK_ORDERING. This will allow reads and writes to the mapping to be weakly ordered, such change can enhance performance on some supporting architectures. Link: https://lore.kernel.org/r/20200212073559.684139-1-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Yixian Liu

commit sha ffd541d45726341c1830ff595fd7352b6d1cfbcd

RDMA/hns: Add the workqueue framework for flush cqe handler HiP08 RoCE hardware lacks ability(a known hardware problem) to flush outstanding WQEs if QP state gets into errored mode for some reason. To overcome this hardware problem and as a workaround, when QP is detected to be in errored state during various legs like post send, post receive etc [1], flush needs to be performed from the driver. The earlier patch[1] sent to solve the hardware limitation explained in the cover-letter had a bug in the software flushing leg. It acquired mutex while modifying QP state to errored state and while conveying it to the hardware using the mailbox. This caused leg to sleep while holding spin-lock and caused crash. Suggested Solution: we have proposed to defer the flushing of the QP in the Errored state using the workqueue to get around with the limitation of our hardware. This patch adds the framework of the workqueue and the flush handler function. [1] https://patchwork.kernel.org/patch/10534271/ Link: https://lore.kernel.org/r/1580983005-13899-2-git-send-email-liuyixian@huawei.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Reviewed-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Yixian Liu

commit sha b53742865e9f09cbba4d9daa161760ec23f36aa4

RDMA/hns: Delayed flush cqe process with workqueue HiP08 RoCE hardware lacks ability(a known hardware problem) to flush outstanding WQEs if QP state gets into errored mode for some reason. To overcome this hardware problem and as a workaround, when QP is detected to be in errored state during various legs like post send, post receive etc[1], flush needs to be performed from the driver. The earlier patch[1] sent to solve the hardware limitation explained in the cover-letter had a bug in the software flushing leg. It acquired mutex while modifying QP state to errored state and while conveying it to the hardware using the mailbox. This caused leg to sleep while holding spin-lock and caused crash. Suggested Solution: we have proposed to defer the flushing of the QP in the Errored state using the workqueue to get around with the limitation of our hardware. This patch specifically adds the calls to the flush handler from where parts of the code like post_send/post_recv etc. when the QP state gets into the errored mode. [1] https://patchwork.kernel.org/patch/10534271/ Link: https://lore.kernel.org/r/1580983005-13899-3-git-send-email-liuyixian@huawei.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Reviewed-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Jason Gunthorpe

commit sha b72bfc965eb5d3475acabb038a1f9f6034c4658d

RDMA/core: Get rid of ib_create_qp_user This function accepts a udata but does nothing with it, and is never passed a !NULL udata. Rename it to ib_create_qp which was the only caller and remove the udata. Link: https://lore.kernel.org/r/20200213191911.GA9898@ziepe.ca Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

Jason Gunthorpe

commit sha 167b95ec887029c5fe0fb0f601fb209323f498fb

RDMA/ucma: Use refcount_t for the ctx->ref Don't use an atomic as a refcount. Link: https://lore.kernel.org/r/20200218191657.GA29724@ziepe.ca Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

view details

push time in 5 days

push eventtorvalds/linux

Fenghua Yu

commit sha 034c7678dd2c05fb08e6fa3b0ac527ead8f1b11b

selftests/resctrl: Add README for resctrl tests resctrl tests will be implemented. README is added for the tool first. Co-developed-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Sai Praneeth Prakhya

commit sha 591a6e8588fc1dbdc04355e8ad7b0be43d221c9c

selftests/resctrl: Add basic resctrl file system operations and data The basic resctrl file system operations and data are added for future usage by resctrl selftest tool. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Co-developed-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Sai Praneeth Prakhya

commit sha 1d3f08687d76c0a80be35bbab6d64b1a8f2730e8

selftests/resctrl: Read memory bandwidth from perf IMC counter and from resctrl file system Total memory bandwidth can be monitored from perf IMC counter and from resctrl file system. Later the two will be compared to verify the total memory bandwidth read from resctrl is correct. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Co-developed-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Sai Praneeth Prakhya

commit sha 7f4d257e3a2a4864268f6f0da7c85ca4f083e643

selftests/resctrl: Add callback to start a benchmark The callback starts a child process and puts the child pid in created resctrl group with specified memory bandwidth in schemata. The child starts running benchmark. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Co-developed-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Sai Praneeth Prakhya

commit sha a2561b12fe392e0024e3187d325dbcbde7b99c04

selftests/resctrl: Add built in benchmark Built-in benchmark fill_buf generates stressful memory bandwidth and cache traffic. Later it will be used as a default benchmark by various resctrl tests such as MBA (Memory Bandwidth Allocation) and MBM (Memory Bandwidth Monitoring) tests. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Co-developed-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Fenghua Yu

commit sha ecdbb911f22d6cc5d422571dedf048b889d71417

selftests/resctrl: Add MBM test MBM (Memory Bandwidth Monitoring) test is the first implemented selftest. It starts a stressful memory bandwidth benchmark and assigns the bandwidth pid in a resctrl monitoring group. Read and compare perf IMC counter and MBM total bytes for the benchmark. The numbers should be close enough to pass the test. Default benchmark is built-in fill_buf. But users can specify their own benchmark by option "-b". We can add memory bandwidth monitoring for multiple processes in the future. Co-developed-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Co-developed-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Fenghua Yu

commit sha 01fee6b4d1f93247e806dddb0065b88317949085

selftests/resctrl: Add MBA test MBA (Memory Bandwidth Allocation) test starts a stressful memory bandwidth benchmark and allocates memory bandwidth from 100% down to 10% for the benchmark process. For each allocation, compare perf IMC counter and mbm total bytes from resctrl. The difference between the two values should be within a threshold to pass the test. Default benchmark is built-in fill_buf. But users can specify their own benchmark by option "-b". We can add memory bandwidth allocation for multiple processes in the future. Co-developed-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Co-developed-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Fenghua Yu

commit sha 78941183d1b151317beb37b25690b7d87fe2596d

selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest Cache QoS Monitoring (CQM) selftest starts stressful cache benchmark with specified size of memory to access the cache. Last Level cache occupancy reported by CQM should be close to the size of the memory. Co-developed-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Co-developed-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Fenghua Yu

commit sha 790bf585b0eeec9aa0e680ba090142b98da7f948

selftests/resctrl: Add Cache Allocation Technology (CAT) selftest Cache Allocation Technology (CAT) selftest allocates a portion of last level cache and starts a benchmark to read each cache line in this portion of cache. Measure the cache misses in perf and the misses should be equal to the number of cache lines in this portion of cache. We don't use CQM to calculate cache usage because some CAT enabled platforms don't have CQM. Co-developed-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Co-developed-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Babu Moger

commit sha 53f74fbec9f069241aa6f4c4d908229c77ee83e2

selftests/resctrl: Add vendor detection mechanism RESCTRL feature is supported both on Intel and AMD now. Some features are implemented differently. Add vendor detection mechanism. Use the vendor check where there are differences. Signed-off-by: Babu Moger <babu.moger@amd.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Babu Moger

commit sha c0327e1d7c42adb31a9541de6812f740893177a1

selftests/resctrl: Use cache index3 id for AMD schemata masks AMD uses the cache l3 boundary for schemata masks. Update it accordigly. Signed-off-by: Babu Moger <babu.moger@amd.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Babu Moger

commit sha 85f553d24ada5399cab730e3199a970e2cc78c82

selftests/resctrl: Disable MBA and MBM tests for AMD For now, disable MBA and MBM tests for AMD. Deciding test pass/fail is not clear right now. We can enable when we have some clarity. Signed-off-by: Babu Moger <babu.moger@amd.com> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Fenghua Yu

commit sha 3032e3a7c7e3ef4a9839ec99c60ffeed76830bf2

selftests/resctrl: Add the test in MAINTAINERS The resctrl selftest will be maintained by RDT maintainers. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Colin Ian King

commit sha 14f4283aa3e69a68e0a2e3bdb8a651a39bf18c52

selftests/resctrl: fix spelling mistake "Errror" -> "Error" There are two spelling mistakes in error messages. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

YueHaibing

commit sha 785c4e834f5f34dc00a398e935a89fd38416ff64

selftests/timens: Remove duplicated include <time.h> Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Masanari Iida

commit sha 9c249ec312dbdb3ef05f5b2672c8c1d1f7562269

selftests/ftrace: Fix typo in trigger-multihist.tc This patch fix a spelling typo in trigger-multihist.tc Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Kees Cook

commit sha 1ae81d78a8b2f6314fb90dc4296202611e710b37

selftests/seccomp: Adjust test fixture counts The seccomp selftest reported the wrong test counts since it was using slightly the wrong API for defining text fixtures. Adjust the API usage. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Shuah Khan

commit sha 29e911ef7b706215caf02a82b0d3076611d6abe8

selftests: Fix kselftest O=objdir build from cluttering top level objdir make kselftest-all O=objdir builds create generated objects in objdir. This clutters the top level directory with kselftest objects. Fix it to create sub-directory under objdir for kselftest objects. Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Shuah Khan

commit sha 1dc74544edc6753d628fcd1059559fb0ddc1b422

selftests: android: ion: Fix ionmap_test compile error ionmap_test compile rule is missing ipcsocket.c dependency. Add it to fix the following compile errors: ..android/ion/ionutils.c:221: undefined reference to `sendtosocket' ..android/ion/ionutils.c:243: undefined reference to `receivefromsocket' Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

Shuah Khan

commit sha fb0bb3952401eac4e5a39551239f4e3bc9aeee4a

selftests: android: Fix custom install from skipping test progs Update custom install rule to install all generated test programs. This fixes android/ion tests to be installed correctly. Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>

view details

push time in 5 days

push eventtorvalds/linux

Uwe Kleine-König

commit sha fcdba3c33a4dfdc3dc3f2cce46aaf0d6fc1df6c1

hwrng: imx-rngc - improve dependencies The imx-rngc driver binds to devices that are compatible to "fsl,imx25-rngb". Grepping through the device tree sources suggests this only exists on i.MX25. So restrict dependencies to configs that have this SoC enabled, but allow compile testing. For the latter additional dependencies for clk and readl/writel are necessary. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Daniel Jordan

commit sha 41ccdbfd5427bbbf3ed58b16750113b38fad1780

padata: fix uninitialized return value in padata_replace() According to Geert's report[0], kernel/padata.c: warning: 'err' may be used uninitialized in this function [-Wuninitialized]: => 539:2 Warning is seen only with older compilers on certain archs. The runtime effect is potentially returning garbage down the stack when padata's cpumasks are modified before any pcrypt requests have run. Simplest fix is to initialize err to the success value. [0] http://lkml.kernel.org/r/20200210135506.11536-1-geert@linux-m68k.org Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: bbefa1dd6a6d ("crypto: pcrypt - Avoid deadlock by using per-instance padata queues") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Kenneth Lee

commit sha aa017ab97a223d55f4955e2864daf7a05ba1f8a2

uacce: Add documents for uacce Uacce (Unified/User-space-access-intended Accelerator Framework) is a kernel module targets to provide Shared Virtual Addressing (SVA) between the accelerator and process. This patch add document to explain how it works. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Kenneth Lee <liguozhu@hisilicon.com> Signed-off-by: Zaibo Xu <xuzaibo@huawei.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Kenneth Lee

commit sha 015d239ac0142ad0e26567fd890ef8d171f13709

uacce: add uacce driver Uacce (Unified/User-space-access-intended Accelerator Framework) targets to provide Shared Virtual Addressing (SVA) between accelerators and processes. So accelerator can access any data structure of the main cpu. This differs from the data sharing between cpu and io device, which share only data content rather than address. Since unified address, hardware and user space of process can share the same virtual address in the communication. Uacce create a chrdev for every registration, the queue is allocated to the process when the chrdev is opened. Then the process can access the hardware resource by interact with the queue file. By mmap the queue file space to user space, the process can directly put requests to the hardware without syscall to the kernel space. The IOMMU core only tracks mm<->device bonds at the moment, because it only needs to handle IOTLB invalidation and PASID table entries. However uacce needs a finer granularity since multiple queues from the same device can be bound to an mm. When the mm exits, all bound queues must be stopped so that the IOMMU can safely clear the PASID table entry and reallocate the PASID. An intermediate struct uacce_mm links uacce devices and queues. Note that an mm may be bound to multiple devices but an uacce_mm structure only ever belongs to a single device, because we don't need anything more complex (if multiple devices are bound to one mm, then we'll create one uacce_mm for each bond). uacce_device --+-- uacce_mm --+-- uacce_queue | '-- uacce_queue | '-- uacce_mm --+-- uacce_queue +-- uacce_queue '-- uacce_queue Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Kenneth Lee <liguozhu@hisilicon.com> Signed-off-by: Zaibo Xu <xuzaibo@huawei.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Zhangfei Gao

commit sha 18bead70e9919bb2d1826c4070f2982dd63e2fcc

crypto: hisilicon - Remove module_param uacce_mode Remove the module_param uacce_mode, which is not used currently. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Zhangfei Gao

commit sha 9e00df7156e45e42c695ffc596b4bf1328d00516

crypto: hisilicon - register zip engine to uacce Register qm to uacce framework for user crypto driver Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha 30332eeefec8f83afcea00c360f99ef64b87f220

debugfs: regset32: Add Runtime PM support Hardware registers of devices under control of power management cannot be accessed at all times. If such a device is suspended, register accesses may lead to undefined behavior, like reading bogus values, or causing exceptions or system lock-ups. Extend struct debugfs_regset32 with an optional field to let device drivers specify the device the registers in the set belong to. This allows debugfs_show_regset32() to make sure the device is resumed while its registers are being read. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha f5f7e1a049e685045a0a6cfd508aab79621446b8

crypto: ccree - fix debugfs register access while suspended Reading the debugfs files under /sys/kernel/debug/ccree/ can be done by the user at any time. On R-Car SoCs, the CCREE device is power-managed using a moduile clock, and if this clock is not running, bogus register values may be read. Fix this by filling in the debugfs_regset32.dev field, so debugfs will make sure the device is resumed while its registers are being read. This fixes the bogus values (0x00000260) in the register dumps on R-Car H3 ES1.0: -e6601000.crypto/regs:HOST_IRR = 0x00000260 -e6601000.crypto/regs:HOST_POWER_DOWN_EN = 0x00000260 +e6601000.crypto/regs:HOST_IRR = 0x00000038 +e6601000.crypto/regs:HOST_POWER_DOWN_EN = 0x00000038 e6601000.crypto/regs:AXIM_MON_ERR = 0x00000000 e6601000.crypto/regs:DSCRPTR_QUEUE_CONTENT = 0x000002aa -e6601000.crypto/regs:HOST_IMR = 0x00000260 +e6601000.crypto/regs:HOST_IMR = 0x017ffeff e6601000.crypto/regs:AXIM_CFG = 0x001f0007 e6601000.crypto/regs:AXIM_CACHE_PARAMS = 0x00000000 -e6601000.crypto/regs:GPR_HOST = 0x00000260 +e6601000.crypto/regs:GPR_HOST = 0x017ffeff e6601000.crypto/regs:AXIM_MON_COMP = 0x00000000 -e6601000.crypto/version:SIGNATURE = 0x00000260 -e6601000.crypto/version:VERSION = 0x00000260 +e6601000.crypto/version:SIGNATURE = 0xdcc63000 +e6601000.crypto/version:VERSION = 0xaf400001 Note that this behavior is system-dependent, and the issue does not show up on all R-Car Gen3 SoCs and boards. Even when the device is suspended, the module clock may be left enabled, if configured by the firmware for Secure Mode, or when controlled by the Real-Time Core. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Gilad Ben-Yossef <gilad@benyossef.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@ragnatech.se> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha b83fd3e5ec284453561784491d877ceff950d1c6

crypto: ccree - fix retry handling in cc_send_sync_request() If cc_queues_status() indicates that the queue is full, cc_send_sync_request() should loop and retry. However, cc_queues_status() returns either 0 (for success), or -ENOSPC (for queue full), while cc_send_sync_request() checks for real errors by comparing with -EAGAIN. Hence -ENOSPC is always considered a real error, and the code never retries the operation. Fix this by just removing the check, as cc_queues_status() never returns any other error value than -ENOSPC. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha f4274eeca476305a78f140e2e74bd07bd379a5da

crypto: ccree - remove unneeded casts Unneeded casts prevent the compiler from performing valuable checks. This is especially true for function pointers. Remove these casts, to prevent silently introducing bugs when a variable's type might be changed in the future. No change in generated code. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha f08b58501c74d6ec0828b55a0d4e0b2e840c2b9e

crypto: ccree - swap SHA384 and SHA512 larval hashes at build time Due to the way the hardware works, every double word in the SHA384 and SHA512 larval hashes must be swapped. Currently this is done at run time, during driver initialization. However, this swapping can easily be done at build time. Treating each double word as two words has the benefit of changing the larval hashes' types from u64[] to u32[], like for all other hashes, and allows dropping the casts and size doublings when calling cc_set_sram_desc(). Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha 08e8cb119f5ab434c398d865b16f9d7bdb76b0b0

crypto: ccree - drop duplicated error message on SRAM exhaustion When no SRAM can be allocated, cc_sram_alloc() already prints an error message. Hence there is no need to duplicate this in all callers. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha fc3b8c11aab728a536b6bb89d6760df9c8694d23

crypto: ccree - remove empty cc_sram_mgr_fini() cc_sram_mgr_fini() doesn't do anything, so it can just be removed. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha 2f272ef37c29425a6526fec976cf200dda950ee1

crypto: ccree - clean up clock handling Use devm_clk_get_optional() instead of devm_clk_get() and explicit optional clock handling. As clk_prepare_enable() and clk_disable_unprepare() handle optional clocks fine, the cc_clk_on() and cc_clk_off() wrappers can be removed. While at it, use the new "%pe" format specifier to print error codes. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha ba99b6f9bd59ff47ca5d1c9cfa4c6a6bd5499d76

crypto: ccree - make mlli_params.mlli_virt_addr void * mlli_params.mlli_virt_addr is just a buffer of memory. This allows to drop a cast. No change in generated code. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha 5fabab0d36d333c1d182369c845f6c7546a4afb8

crypto: ccree - use existing helpers to split 64-bit addresses Use the existing lower_32_bits() and upper_32_bits() macros instead of explicit casts and shifts to split a 64-bit address in its two 32-bit parts. Drop the superfluous cast to "u16", as the FIELD_PREP() macro already masks it to the specified field width. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha e431cc04381708660b321f6c5a5500ea577a903e

crypto: ccree - defer larval_digest_addr init until needed While the larval digest addresses are not always used in cc_get_plain_hmac_key() and cc_hash_digest(), they are always calculated. Defer their calculations to the points where needed. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha 37282f8d157169198e433d1f63e19a713267516c

crypto: ccree - remove bogus paragraph about freeing SRAM The SRAM allocator does not support deallocating memory. Hence remove all references to freeing SRAM. Fix grammar while at it. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha 1a895f1d5bceee0c6b66dcfdabcc9804006ca229

crypto: ccree - use u32 for SRAM addresses SRAM addresses are small integer offsets into local SRAM. Currently they are stored using a mixture of cc_sram_addr_t (u64), u32, and dma_addr_t types. Settle on u32, and remove the cc_sram_addr_t typedefs. This allows to drop several casts. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

Geert Uytterhoeven

commit sha 8c7849a30255cfd9a9ba412b1517e20a8572448b

crypto: ccree - simplify Runtime PM handling Currently, a large part of the probe function runs before Runtime PM is enabled. As the driver manages the device's clock manually, this may work fine on some systems, but may break on platforms with a more complex power hierarchy. Fix this by moving the initialization of Runtime PM before the first register access (in cc_wait_for_reset_completion()), and putting the device to sleep only after the last access (in cc_set_ree_fips_status()). This allows to remove the pm_on flag, which was used to track manually if Runtime PM had been enabled or not. Remove the cc_pm_{init,go,fini}() wrappers, as they are called only once, and obscure operation. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

view details

push time in 5 days

push eventtorvalds/linux

Linus Torvalds

commit sha ab33eb494c603bb200fbe3c3c018fc9c544487b8

x86: remove __put_user_asm() infrastructure The last user was removed by commit 4b842e4e25b1 ("x86: get rid of small constant size cases in raw_copy_{to,from}_user()"). Get rid of the left-overs before somebody tries to use it again. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

view details

Linus Torvalds

commit sha 1a323ea5356edbb3073dc59d51b9e6b86908857d

x86: get rid of 'errret' argument to __get_user_xyz() macross Every remaining user just has the error case returning -EFAULT. In fact, the exception was __get_user_asm_nozero(), which was removed in commit 4b842e4e25b1 ("x86: get rid of small constant size cases in raw_copy_{to,from}_user()"), and the other __get_user_xyz() macros just followed suit for consistency. Fix up some macro whitespace while at it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

view details

push time in 6 days

push eventtorvalds/linux

Jakub Kicinski

commit sha d5e3c87d302cd08d01ac041cc3f8049c979cb97d

net: fec: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha 4db086932370c6b134910eb8f06e3a2b3b57701d

net: gianfar: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha 4f9546d24a12e9041eae20574ab244f5410e02c4

net: hns: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha 7b8fda64b29d433712575c99eb97545899667d06

net: hns3: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha 86f0f963f8db967c9350102f6c5b40e8c053cab7

net: e1000: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha 194219a79259390dd7b69208c41f39d096139014

net: fm10k: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha 5f85d407ed4bf90160f22929fd5cd49299b59374

net: i40e: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha cf5d0f1c24b1aa23f84983adacfeab3365ce2cc8

net: iavf: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha a289108c2a62bf712558942ac963c8eef42ac3af

net: igb: let core reject the unsupported coalescing parameters Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver was rejecting almost all unsupported parameters already, it was only missing a check for tx_max_coalesced_frames_irq. As a side effect of these changes the error code for unsupported params changes from ENOTSUPP to EOPNOTSUPP. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha 3ff8000ddc7d940e258dcfd848ee7bc44481a73a

net: igbvf: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha dbfa497a26e146b9b9401b3c4e8e2ed69ebdd4ad

net: igc: let core reject the unsupported coalescing parameters Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver was rejecting almost all unsupported parameters already, it was only missing a check for tx_max_coalesced_frames_irq. As a side effect of these changes the error code for unsupported params changes from ENOTSUPP to EOPNOTSUPP. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha eb7975d3789f6346999ee82a20ddce4df9a56f3f

net: ixgbe: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Jakub Kicinski

commit sha e259b9114b1e352940299a22faec8bc50cb746b8

net: ixgbevf: reject unsupported coalescing params Set ethtool_ops->supported_coalesce_params to let the core reject unsupported coalescing parameters. This driver did not previously reject unsupported parameters. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

David S. Miller

commit sha af91fd7e17f12119763b95ccf526bce4b0693144

Merge branch 'ethtool-consolidate-irq-coalescing-part-4' Jakub Kicinski says: ==================== ethtool: consolidate irq coalescing - part 4 Convert more drivers following the groundwork laid in a recent patch set [1] and continued in [2], [3]. The aim of the effort is to consolidate irq coalescing parameter validation in the core. This set converts 15 drivers in drivers/net/ethernet - remaining Intel drivers, Freescale/NXP, and others. 2 more conversion sets to come. [1] https://lore.kernel.org/netdev/20200305051542.991898-1-kuba@kernel.org/ [2] https://lore.kernel.org/netdev/20200306010602.1620354-1-kuba@kernel.org/ [3] https://lore.kernel.org/netdev/20200310021512.1861626-1-kuba@kernel.org/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Kuniyuki Iwashima

commit sha 16f6c2518f9e0347eb54d368473ebd0904ac4298

tcp: Remove unnecessary conditions in inet_csk_bind_conflict(). When we get an ephemeral port, the relax is false, so the SO_REUSEADDR conditions may be evaluated twice. We do not need to check the conditions again. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Kuniyuki Iwashima

commit sha 4b01a9674231a97553a55456d883f584e948a78d

tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are exhausted. Commit aacd9289af8b82f5fb01bcdd53d0e3406d1333c7 ("tcp: bind() use stronger condition for bind_conflict") introduced a restriction to forbid to bind SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to assign ports dispersedly so that we can connect to the same remote host. The change results in accelerating port depletion so that we fail to bind sockets to the same local port even if we want to connect to the different remote hosts. You can reproduce this issue by following instructions below. 1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768" 2. set SO_REUSEADDR to two sockets. 3. bind two sockets to (localhost, 0) and the latter fails. Therefore, when ephemeral ports are exhausted, bind(0) should fallback to the legacy behaviour to enable the SO_REUSEADDR option and make it possible to connect to different remote (addr, port) tuples. This patch allows us to bind SO_REUSEADDR enabled sockets to the same (addr, port) only when net.ipv4.ip_autobind_reuse is set 1 and all ephemeral ports are exhausted. This also allows connect() and listen() to share ports in the following way and may break some applications. So the ip_autobind_reuse is 0 by default and disables the feature. 1. setsockopt(sk1, SO_REUSEADDR) 2. setsockopt(sk2, SO_REUSEADDR) 3. bind(sk1, saddr, 0) 4. bind(sk2, saddr, 0) 5. connect(sk1, daddr) 6. listen(sk2) If it is set 1, we can fully utilize the 4-tuples, but we should use IP_BIND_ADDRESS_NO_PORT for bind()+connect() as possible. The notable thing is that if all sockets bound to the same port have both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an ephemeral port and also do listen(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Kuniyuki Iwashima

commit sha 335759211a327d61244580070d74f55561c35895

tcp: Forbid to bind more than one sockets haveing SO_REUSEADDR and SO_REUSEPORT per EUID. If there is no TCP_LISTEN socket on a ephemeral port, we can bind multiple sockets having SO_REUSEADDR to the same port. Then if all sockets bound to the port have also SO_REUSEPORT enabled and have the same EUID, all of them can be listened. This is not safe. Let's say, an application has root privilege and binds sockets to an ephemeral port with both of SO_REUSEADDR and SO_REUSEPORT. When none of sockets is not listened yet, a malicious user can use sudo, exhaust ephemeral ports, and bind sockets to the same ephemeral port, so he or she can call listen and steal the port. To prevent this issue, we must not bind more than one sockets that have the same EUID and both of SO_REUSEADDR and SO_REUSEPORT. On the other hand, if the sockets have different EUIDs, the issue above does not occur. After sockets with different EUIDs are bound to the same port and one of them is listened, no more socket can be listened. This is because the condition below is evaluated true and listen() for the second socket fails. } else if (!reuseport_ok || !reuseport || !sk2->sk_reuseport || rcu_access_pointer(sk->sk_reuseport_cb) || (sk2->sk_state != TCP_TIME_WAIT && !uid_eq(uid, sock_i_uid(sk2)))) { if (inet_rcv_saddr_equal(sk, sk2, true)) break; } Therefore, on the same port, we cannot do listen() for multiple sockets with different EUIDs and any other listen syscalls fail, so the problem does not happen. In this case, we can still call connect() for other sockets that cannot be listened, so we have to succeed to call bind() in order to fully utilize 4-tuples. Summarizing the above, we should be able to bind only one socket having SO_REUSEADDR and SO_REUSEPORT per EUID. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

Kuniyuki Iwashima

commit sha 7f204a7de8b08542aca3c1daa96ed20e1177ba87

selftests: net: Add SO_REUSEADDR test to check if 4-tuples are fully utilized. This commit adds a test to check if we can fully utilize 4-tuples for connect() when all ephemeral ports are exhausted. The test program changes the local port range to use only one port and binds two sockets with or without SO_REUSEADDR and SO_REUSEPORT, and with the same EUID or with different EUIDs, then do listen(). We should be able to bind only one socket having both SO_REUSEADDR and SO_REUSEPORT per EUID, which restriction is to prevent unintentional listen(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

David S. Miller

commit sha 93e616131a3825189570524d4792b5223bf1d68c

Merge branch 'bind_addr_zero' Kuniyuki Iwashima says: ==================== Improve bind(addr, 0) behaviour. Currently we fail to bind sockets to ephemeral ports when all of the ports are exhausted even if all sockets have SO_REUSEADDR enabled. In this case, we still have a chance to connect to the different remote hosts. These patches add net.ipv4.ip_autobind_reuse option and fix the behaviour to fully utilize all space of the local (addr, port) tuples. Changes in v5: - Add more description to documents. - Fix sysctl option to use proc_dointvec_minmax. - Remove the Fixes: tag and squash two commits. Changes in v4: - Add net.ipv4.ip_autobind_reuse option to not change the current behaviour. - Modify .gitignore for test. https://lore.kernel.org/netdev/20200308181615.90135-1-kuniyu@amazon.co.jp/ Changes in v3: - Change the title and write more specific description of the 3rd patch. - Add a test in tools/testing/selftests/net/ as the 4th patch. https://lore.kernel.org/netdev/20200229113554.78338-1-kuniyu@amazon.co.jp/ Changes in v2: - Change the description of the 2nd patch ('localhost' -> 'address'). - Correct the description and the if statement of the 3rd patch. https://lore.kernel.org/netdev/20200226074631.67688-1-kuniyu@amazon.co.jp/ v1 with tests: https://lore.kernel.org/netdev/20200220152020.13056-1-kuniyu@amazon.co.jp/ ==================== Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

view details

David S. Miller

commit sha bf3347c4d15e26ab17fce3aa4041345198f4280c

Merge branch 'ct-offload' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

view details

push time in 6 days

push eventtorvalds/linux

Andy Shevchenko

commit sha d545514e3e36fa69450a0b712664943c4bfde427

MAINTAINERS: Sort entries in database for PDx86 Run parse-maintainers.pl and choose PDx86 records. Fix them accordingly. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Gayatri Kammela

commit sha 16292bed9c56a20715d942fd5d9e025f01fa65fe

platform/x86: intel_pmc_core: Add Atom based Jasper Lake (JSL) platform support Add Jasper Lake to the list of the platforms that intel_pmc_core driver supports for pmc_core device. Just like Ice Lake, Tiger Lake and Elkhart Lake, Jasper Lake can also reuse all the Cannon Lake PCH IPs. Also, it uses the same PCH IPs of Tiger Lake, no additional effort is needed to enable but to simply reuse them. Cc: Srinivas Pandruvada <srinivas.pandruvada@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Box <david.e.box@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha 57ba2633a1b618816fdee85d5baa52f1874ee07f

platform/x86: intel-hid: Move MODULE_DEVICE_TABLE() closer to the table Move MODULE_DEVICE_TABLE() closer to the table for better maintenance. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha 807e92d1bdd0e49771390e546a52dfe05855c13e

platform/x86: intel-vbtn: Move MODULE_DEVICE_TABLE() closer to the table Move MODULE_DEVICE_TABLE() closer to the table for better maintenance. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha d82d3ef66d91e288b54c2bf5cbba9ec0c106ff50

platform/x86: Makefile: Group modules by companies and functions For better maintenance group modules by companies and functions. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha 45a3d578f2edb5b06fa3c151569959c819a808ca

platform/x86: Kconfig: Group modules by companies and functions For better maintenance group modules by companies and functions. No functional change intended. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha 94ed313404d8c345148239a873f2154ecb3d0c37

platform/x86: dell_rbu: Use sysfs_create_group() API Use sysfs_create_group() API instead of open coded variant. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha d19f359fbdc6b5d49e9b9a0db27a996b28a2ded3

platform/x86: dell_rbu: don't open code list_for_each_entry*() The loop declaration in packet_read_list() and packet_empty_list() can be simplified by reusing the common list_for_each_entry*() helper macros. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha 45e21277f94753c3e3429e63e5785888ee1ead4a

platform/x86: dell_rbu: Simplify cleanup code in create_packet() The code looks more nicer if we use: while (idx--) instead: for (;idx>0;idx--) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha 682baa24e2a2adbf4db615eae38a553fb16564b4

platform/x86: dell_rbu: Use max_t() to get rid of casting There is no need to cast both values in max_t() macro, so, use it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha e5e325722f43c9ed5f7a21d4b7485c36ac0eccc9

platform/x86: dell_rbu: Unify format of the printed messages Here we do: - unify all message to have module name prefix via pr_fmt() - replace printk(LEVEL ... ) with pr_level(...) - drop __func__ from messages - join string literals to be on one line - drop trailing periods and spaces Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Srinivas Pandruvada

commit sha 14a8aa4964e01aafcd684499315d3d4ea1de428f

tools/power/x86/intel-speed-select: Fix display for turbo-freq auto mode When mailbox command for the turbo-freq enable fails, then don't display result for auto-mode. When turbo-freq enable fails, there is no point to set CPU priorities. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Srinivas Pandruvada

commit sha 3b0fe3bab31fa7b320667e3d001eaa3f23590c05

tools/power/x86/intel-speed-select: Avoid duplicate names for json parsing For the command "intel-speed-select perf-profile info": There are two instances of “speed-select-turbo-freq” underneath “perf-profile-level-0” for each package. When we load the output into python with json.load(), the second instance overwrites the first. Result is that we can only access: "speed-select-turbo-freq": { "bucket-0": { "high-priority-cores-count": "2", "high-priority-max-frequency(MHz)": "3000", "high-priority-max-avx2-frequency(MHz)": "2800", "high-priority-max-avx512-frequency(MHz)": "2600" }, Because it is a duplicate of "speed-select-turbo-freq": "disabled" Same is true for "speed-select-base-freq". To avoid this add "-properties" suffix for the second instance to differentiate. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Georg Müller

commit sha 95b31e35239e5e1689e3d965d692a313c71bd8ab

platform/x86: pmc_atom: Add Lex 2I385SW to critclk_systems DMI table The Lex 2I385SW board has two Intel I211 ethernet controllers. Without this patch, only the first port is usable. The second port fails to start with the following message: igb: probe of 0000:02:00.0 failed with error -2 Fixes: 648e921888ad ("clk: x86: Stop marking clocks as CLK_IS_CRITICAL") Tested-by: Georg Müller <georgmueller@gmx.net> Signed-off-by: Georg Müller <georgmueller@gmx.net> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Gayatri Kammela

commit sha a45096ac70e59498ef3d1fe67ab6a10dbccf59ef

platform/x86: intel_pmc_core: Add debugfs entry to access sub-state residencies Prior to Tiger Lake, the platforms that support pmc_core have no sub-states of S0ix. Tiger Lake has 8 sub-states/low power modes of S0ix ranging from S0i2.0-S0i2.2 and S0i3.0-S0i3.4, simply represented as S0ix.y. Create a debugfs entry to access residency of each sub-state. Cc: Srinivas Pandruvada <srinivas.pandruvada@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Box <david.e.box@intel.com> Signed-off-by: David Box <david.e.box@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Gayatri Kammela

commit sha f632817d5ef369a6f433449a1b8fa26627fc40e0

platform/x86: intel_pmc_core: Add debugfs entry for low power mode status registers Tiger Lake has 6 status registers that are memory mapped. These registers show the status of the low power mode requirements. The registers are latched on every C10 entry or exit and on every s0ix.y entry/exit. Accessing these registers is useful for debugging any low power related activities. Thus, add debugfs entry to access low power mode status registers. Cc: Srinivas Pandruvada <srinivas.pandruvada@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Box <david.e.box@intel.com> Signed-off-by: David Box <david.e.box@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Gayatri Kammela

commit sha 4d6a63e0b99ea4ba7f718ab084ebf566b7d1585f

platform/x86: intel_pmc_core: Refactor the driver by removing redundant code pmc_core_slps0_dbg_show() is responsible for displaying debug registers through slp_s0_debug_status entry. The driver uses the same but redundant code to dump these debug registers for an S0ix failure. Hence, refactor the driver by removing redundant code and reuse the same function that both dumps registers through slp_s0_debug_status entry and in resume for an S0ix failure. The changes in this patch are preparatory, so platforms that support low power sub-states can dump the debug registers when the attempt to enter low power states are unsuccessful. Cc: Srinivas Pandruvada <srinivas.pandruvada@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Box <david.e.box@intel.com> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Gayatri Kammela

commit sha a018e28f0880c1eaa72b09d2ec64831024d149a6

platform/x86: intel_pmc_core: Remove slp_s0 attributes from tgl_reg_map If platforms such as Tiger Lake has sub-states of S0ix, then both slp_s0_debug_status and slp_s0_dbg_latch entries become invalid. Thus, remove slp_s0_offset and slp_s0_dbg_maps attributes from tgl_reg_map, so that both the entries are not created. Cc: Srinivas Pandruvada <srinivas.pandruvada@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Box <david.e.box@intel.com> Suggested-by: David Box <david.e.box@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Gayatri Kammela

commit sha 913f984a8347ea967ee9693dfa1e15da78e64661

platform/x86: intel_pmc_core: Add an additional parameter to pmc_core_lpm_display() Add a device pointer of type struct device as an additional parameter to pmc_core_lpm_display(), so that the driver can re-use it to dump the debug registers in resume for an S0ix failure. Cc: Srinivas Pandruvada <srinivas.pandruvada@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: David Box <david.e.box@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

view details

Andy Shevchenko

commit sha 8e217b078138b6dadcef3080a5450ecd6767d09f

kgdboc: Use for_each_console() helper Replace open coded single-linked list iteration loop with for_each_console() helper in use. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200124161132.65519-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

view details

push time in 6 days

push eventtorvalds/linux

Randy Dunlap

commit sha bd1a5a53d7c1231f89c51705f6a894516b40aa88

security: <linux/lsm_hooks.h>: fix all kernel-doc warnings Fix all kernel-doc warnings in <linux/lsm_hooks.h>. Fixes the following warnings: ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'quotactl' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'quota_on' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'sb_free_mnt_opts' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'sb_eat_lsm_opts' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'sb_kern_mount' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'sb_show_options' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'sb_add_mnt_opt' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'd_instantiate' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'getprocattr' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'setprocattr' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'locked_down' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'perf_event_open' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'perf_event_alloc' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'perf_event_free' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'perf_event_read' not described in 'security_list_options' ../include/linux/lsm_hooks.h:1830: warning: Function parameter or member 'perf_event_write' not described in 'security_list_options' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: John Johansen <john.johansen@canonical.com> Cc: Kees Cook <keescook@chromium.org> Cc: Micah Morton <mortonm@chromium.org> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: linux-security-module@vger.kernel.org Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@parisplace.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: James Morris <jmorris@namei.org>

view details

YueHaibing

commit sha 3e27a33932df104f4f9ff811467b0b4ccebde773

security: remove duplicated include from security.h Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: James Morris <jmorris@namei.org>

view details

Linus Torvalds

commit sha a16298439bd5469d89ec0e575e1c26e7b9a8178a

Merge branch 'next-general' of git://git.kernel.org:/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Two minor updates for the core security subsystem: - kernel-doc warning fixes from Randy Dunlap - header cleanup from YueHaibing" * 'next-general' of git://git.kernel.org:/pub/scm/linux/kernel/git/jmorris/linux-security: security: remove duplicated include from security.h security: <linux/lsm_hooks.h>: fix all kernel-doc warnings

view details

push time in 6 days

push eventtorvalds/linux

Steve Grubb

commit sha 70b3eeed49e8190d97139806f6fbaf8964306cdb

audit: CONFIG_CHANGE don't log internal bookkeeping as an event Common Criteria calls out for any action that modifies the audit trail to be recorded. That usually is interpreted to mean insertion or removal of rules. It is not required to log modification of the inode information since the watch is still in effect. Additionally, if the rule is a never rule and the underlying file is one they do not want events for, they get an event for this bookkeeping update against their wishes. Since no device/inode info is logged at insertion and no device/inode information is logged on update, there is nothing meaningful being communicated to the admin by the CONFIG_CHANGE updated_rules event. One can assume that the rule was not "modified" because it is still watching the intended target. If the device or inode cannot be resolved, then audit_panic is called which is sufficient. The correct resolution is to drop logging config_update events since the watch is still in effect but just on another unknown inode. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha 4b36cb773a8153417a080f8025d522322f915aea

selinux: move status variables out of selinux_ss It fits more naturally in selinux_state, since it reflects also global state (the enforcing and policyload fields). Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Stephen Smalley

commit sha e9c38f9fc2ccd31befe1bb1605b69213483a15b7

Documentation,selinux: deprecate setting checkreqprot to 1 Deprecate setting the SELinux checkreqprot tunable to 1 via kernel parameter or /sys/fs/selinux/checkreqprot. Setting it to 0 is left intact for compatibility since Android and some Linux distributions do so for security and treat an inability to set it as a fatal error. Eventually setting it to 0 will become a no-op and the kernel will stop using checkreqprot's value internally altogether. checkreqprot was originally introduced as a compatibility mechanism for legacy userspace and the READ_IMPLIES_EXEC personality flag. However, if set to 1, it weakens security by allowing mappings to be made executable without authorization by policy. The default value for the SECURITY_SELINUX_CHECKREQPROT_VALUE config option was changed from 1 to 0 in commit 2a35d196c160e3 ("selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and both Android and Linux distributions began explicitly setting /sys/fs/selinux/checkreqprot to 0 some time ago. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha 06c2efe2cf3aa70abbdf97e88641abca2e707a15

selinux: simplify evaluate_cond_node() It never fails, so it can just return void. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Christian Göttsche

commit sha 7470d0d13fb680bb82b40f18831f7d4ee7a4bb62

selinux: allow kernfs symlinks to inherit parent directory context Currently symlinks on kernel filesystems, like sysfs, are labeled on creation with the parent filesystem root sid. Allow symlinks to inherit the parent directory context, so fine-grained kernfs labeling can be applied to symlinks too and checking contexts doesn't complain about them. For backward-compatibility this behavior is contained in a new policy capability: genfs_seclabel_symlinks Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Vasily Averin

commit sha 8d269a8e2a8f0bca89022f4ec98de460acb90365

selinux: sel_avc_get_stat_idx should increase position index If seq_file .next function does not change position index, read after some lseek can generate unexpected output. $ dd if=/sys/fs/selinux/avc/cache_stats # usual output lookups hits misses allocations reclaims frees 817223 810034 7189 7189 6992 7037 1934894 1926896 7998 7998 7632 7683 1322812 1317176 5636 5636 5456 5507 1560571 1551548 9023 9023 9056 9115 0+1 records in 0+1 records out 189 bytes copied, 5,1564e-05 s, 3,7 MB/s $# read after lseek to midle of last line $ dd if=/sys/fs/selinux/avc/cache_stats bs=180 skip=1 dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset 056 9115 <<<< end of last line 1560571 1551548 9023 9023 9056 9115 <<< whole last line once again 0+1 records in 0+1 records out 45 bytes copied, 8,7221e-05 s, 516 kB/s $# read after lseek beyond end of of file $ dd if=/sys/fs/selinux/avc/cache_stats bs=1000 skip=1 dd: /sys/fs/selinux/avc/cache_stats: cannot skip to specified offset 1560571 1551548 9023 9023 9056 9115 <<<< generates whole last line 0+1 records in 0+1 records out 36 bytes copied, 9,0934e-05 s, 396 kB/s https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha 60abd3181db29ea81742106cc0ac2e27fd05b418

selinux: convert cond_list to array Since it is fixed-size after allocation and we know the size beforehand, using a plain old array is simpler and more efficient. While there, also fix signedness of some related variables/parameters. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha 2b3a003e1543ab47b2f150abe31df4e7a3f8dde8

selinux: convert cond_av_list to array Since it is fixed-size after allocation and we know the size beforehand, using a plain old array is simpler and more efficient. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha 8794d7839038fc018e51d0afbf309b71069d9691

selinux: convert cond_expr to array Since it is fixed-size after allocation and we know the size beforehand, using a plain old array is simpler and more efficient. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Reviewed-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha 89d4d7c88d2b4f252adb434a28ea9b84d629aeb1

selinux: generalize evaluate_cond_node() Both callers iterate the cond_list and call it for each node - turn it into evaluate_cond_nodes(), which does the iteration for them. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Connor O'Brien

commit sha 4ca54d3d3022ce27170b50e4bdecc3a42f05dbdc

security: selinux: allow per-file labeling for bpffs Add support for genfscon per-file labeling of bpffs files. This allows for separate permissions for different pinned bpf objects, which may be completely unrelated to each other. Signed-off-by: Connor O'Brien <connoro@google.com> Signed-off-by: Steven Moreland <smoreland@google.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha 253050f57c7afe87d182f4029645568c2fc837f7

selinux: factor out loop body from filename_trans_read() It simplifies cleanup in the error path. This will be extra useful in later patch. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha c3a276111ea2572399281988b3129683e2a6b60b

selinux: optimize storage of filename transitions In these rules, each rule with the same (target type, target class, filename) values is (in practice) always mapped to the same result type. Therefore, it is much more efficient to group the rules by (ttype, tclass, filename). Thus, this patch drops the stype field from the key and changes the datum to be a linked list of one or more structures that contain a result type and an ebitmap of source types that map the given target to the given result type under the given filename. The size of the hash table is also incremented to 2048 to be more optimal for Fedora policy (which currently has ~2500 unique (ttype, tclass, filename) tuples, regardless of whether the 'unconfined' module is enabled). Not only does this dramtically reduce memory usage when the policy contains a lot of unconfined domains (ergo a lot of filename based transitions), but it also slightly reduces memory usage of strongly confined policies (modeled on Fedora policy with 'unconfined' module disabled) and significantly reduces lookup times of these rules on Fedora (roughly matches the performance of the rhashtable conversion patch [1] posted recently to selinux@vger.kernel.org). An obvious next step is to change binary policy format to match this layout, so that disk space is also saved. However, since that requires more work (including matching userspace changes) and this patch is already beneficial on its own, I'm posting it separately. Performance/memory usage comparison: Kernel | Policy load | Policy load | Mem usage | Mem usage | openbench | | (-unconfined) | | (-unconfined) | (createfiles) -----------------|-------------|---------------|-----------|---------------|-------------- reference | 1,30s | 0,91s | 90MB | 77MB | 55 us/file rhashtable patch | 0.98s | 0,85s | 85MB | 75MB | 38 us/file this patch | 0,95s | 0,87s | 75MB | 75MB | 40 us/file (Memory usage is measured after boot. With SELinux disabled the memory usage was ~60MB on the same system.) [1] https://lore.kernel.org/selinux/20200116213937.77795-1-dev@lynxeye.de/T/ Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Richard Haines

commit sha e4cfa05e9bfe286457082477b32ecd17737bdbce

selinux: Add xfs quota command types Add Q_XQUOTAOFF, Q_XQUOTAON and Q_XSETQLIM to trigger filesystem quotamod permission check. Add Q_XGETQUOTA, Q_XGETQSTAT, Q_XGETQSTATV and Q_XGETNEXTQUOTA to trigger filesystem quotaget permission check. Signed-off-by: Richard Haines <richard_c_haines@btinternet.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha e0ac568de1fa0a38bea6d3c69a894d913a5ca59d

selinux: reduce the use of hard-coded hash sizes Instead allocate hash tables with just the right size based on the actual number of elements (which is almost always known beforehand, we just need to defer the hashtab allocation to the right time). The only case when we don't know the size (with the current policy format) is the new filename transitions hashtable. Here I just left the existing value. After this patch, the time to load Fedora policy on x86_64 decreases from 790 ms to 167 ms. If the unconfined module is removed, it decreases from 750 ms to 122 ms. It is also likely that other operations are going to be faster, mainly string_to_context_struct() or mls_compute_sid(), but I didn't try to quantify that. The memory usage of all hash table arrays increases from ~58 KB to ~163 KB (with Fedora policy on x86_64). Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Stephen Smalley

commit sha e3e0b582c321aefd72db0e7083a0adfe285e96b5

selinux: remove unused initial SIDs and improve handling Remove initial SIDs that have never been used or are no longer used by the kernel from its string table, which is also used to generate the SECINITSID_* symbols referenced in code. Update the code to gracefully handle the fact that these can now be NULL. Stop treating it as an error if a policy defines additional initial SIDs unknown to the kernel. Do not load unused initial SID contexts into the sidtab. Fix the incorrect usage of the name from the ocontext in error messages when loading initial SIDs since these are not presently written to the kernel policy and are therefore always NULL. After this change, it is possible to safely reclaim and reuse some of the unused initial SIDs without compatibility issues. Specifically, unused initial SIDs that were being assigned the same context as the unlabeled initial SID in policies can be reclaimed and reused for another purpose, with existing policies still treating them as having the unlabeled context and future policies having the option of mapping them to a more specific context. For example, this could have been used when the infiniband labeling support was introduced to define initial SIDs for the default pkey and endport SIDs similar to the handling of port/netif/node SIDs rather than always using SECINITSID_UNLABELED as the default. The set of safely reclaimable unused initial SIDs across all known policies is igmp_packet (13), icmp_socket (14), tcp_socket (15), kmod (24), policy (25), and scmp_packet (26); these initial SIDs were assigned the same context as unlabeled in all known policies including mls. If only considering non-mls policies (i.e. assuming that mls users always upgrade policy with their kernels), the set of safely reclaimable unused initial SIDs further includes file_labels (6), init (7), sysctl_modprobe (16), and sysctl_fs (18) through sysctl_dev (23). Adding new initial SIDs beyond SECINITSID_NUM to policy unfortunately became a fatal error in commit 24ed7fdae669 ("selinux: use separate table for initial SID lookup") and even before that it could cause problems on a policy reload (collision between the new initial SID and one allocated at runtime) ever since commit 42596eafdd75 ("selinux: load the initial SIDs upon every policy load") so we cannot safely start adding new initial SIDs to policies beyond SECINITSID_NUM (27) until such a time as all such kernels do not need to be supported and only those that include this commit are relevant. That is not a big deal since we haven't added a new initial SID since 2004 (v2.6.7) and we have plenty of unused ones we can reclaim if we truly need one. If we want to avoid the wasted storage in initial_sid_to_string[] and/or sidtab->isids[] for the unused initial SIDs, we could introduce an indirection between the kernel initial SID values and the policy initial SID values and just map the policy SID values in the ocontexts to the kernel values during policy_load_isids(). Originally I thought we'd do this by preserving the initial SID names in the kernel policy and creating a mapping at load time like we do for the security classes and permissions but that would require a new kernel policy format version and associated changes to libsepol/checkpolicy and I'm not sure it is justified. Simpler approach is just to create a fixed mapping table in the kernel from the existing fixed policy values to the kernel values. Less flexible but probably sufficient. A separate selinux userspace change was applied in https://github.com/SELinuxProject/selinux/commit/8677ce5e8f592950ae6f14cea1b68a20ddc1ac25 to enable removal of most of the unused initial SID contexts from policies, but there is no dependency between that change and this one. That change permits removing all of the unused initial SID contexts from policy except for the fs and sysctl SID contexts. The initial SID declarations themselves would remain in policy to preserve the values of subsequent ones but the contexts can be dropped. If/when the kernel decides to reuse one of them, future policies can change the name and start assigning a context again without breaking compatibility. Here is how I would envision staging changes to the initial SIDs in a compatible manner after this commit is applied: 1. At any time after this commit is applied, the kernel could choose to reclaim one of the safely reclaimable unused initial SIDs listed above for a new purpose (i.e. replace its NULL entry in the initial_sid_to_string[] table with a new name and start using the newly generated SECINITSID_name symbol in code), and refpolicy could at that time rename its declaration of that initial SID to reflect its new purpose and start assigning it a context going forward. Existing/old policies would map the reclaimed initial SID to the unlabeled context, so that would be the initial default behavior until policies are updated. This doesn't depend on the selinux userspace change; it will work with existing policies and userspace. 2. In 6 months or so we'll have another SELinux userspace release that will include the libsepol/checkpolicy support for omitting unused initial SID contexts. 3. At any time after that release, refpolicy can make that release its minimum build requirement and drop the sid context statements (but not the sid declarations) for all of the unused initial SIDs except for fs and sysctl, which must remain for compatibility on policy reload with old kernels and for compatibility with kernels that were still using SECINITSID_SYSCTL (< 2.6.39). This doesn't depend on this kernel commit; it will work with previous kernels as well. 4. After N years for some value of N, refpolicy decides that it no longer cares about policy reload compatibility for kernels that predate this kernel commit, and refpolicy drops the fs and sysctl SID contexts from policy too (but retains the declarations). 5. After M years for some value of M, the kernel decides that it no longer cares about compatibility with refpolicies that predate step 4 (dropping the fs and sysctl SIDs), and those two SIDs also become safely reclaimable. This step is optional and need not ever occur unless we decide that the need to reclaim those two SIDs outweighs the compatibility cost. 6. After O years for some value of O, refpolicy decides that it no longer cares about policy load (not just reload) compatibility for kernels that predate this kernel commit, and both kernel and refpolicy can then start adding and using new initial SIDs beyond 27. This does not depend on the previous change (step 5) and can occur independent of it. Fixes: https://github.com/SELinuxProject/selinux-kernel/issues/12 Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Ondrej Mosnacek

commit sha 34a2dab488bcaf2ac2198d7b305794280d73207b

selinux: clean up error path in policydb_init() Commit e0ac568de1fa ("selinux: reduce the use of hard-coded hash sizes") moved symtab initialization out of policydb_init(), but left the cleanup of symtabs from the error path. This patch fixes the oversight. Suggested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Paul Moore

commit sha 5e729e111eaf37b7941c678cb84af62539a4799a

selinux: avtab_init() and cond_policydb_init() return void The avtab_init() and cond_policydb_init() functions always return zero so mark them as returning void and update the callers not to check for a return value. Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Richard Guy Briggs

commit sha 1320a4052ea11eb2879eb7361da15a106a780972

audit: trigger accompanying records when no rules present When there are no audit rules registered, mandatory records (config, etc.) are missing their accompanying records (syscall, proctitle, etc.). This is due to audit context dummy set on syscall entry based on absence of rules that signals that no other records are to be printed. Clear the dummy bit if any record is generated. The proctitle context and dummy checks are pointless since the proctitle record will not be printed if no syscall records are printed. Please see upstream github issue https://github.com/linux-audit/audit-kernel/issues/120 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

Stephen Smalley

commit sha 27978872179b815105082902b22c516359576673

MAINTAINERS: Update my email address Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com>

view details

push time in 6 days

push eventtorvalds/linux

Andreas Gruenbacher

commit sha badb55ec208adc4c406ed084f486deb1f9f5baa0

gfs2: Split gfs2_lm_withdraw into two functions Split gfs2_lm_withdraw into a function that prints an error message and a function that withdraws the filesystem. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>

view details

Andreas Gruenbacher

commit sha 8dc88ac68df89851488a60b8f1582fe466f41a64

gfs2: Report errors before withdraw In gfs2_rgrp_verify and compute_bitstructs, make sure to report errors before withdrawing the filesystem: otherwise, when we withdraw first and withdraw is configured to panic, we'll never get to the error reporting. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>

view details

Andreas Gruenbacher

commit sha d7e7ab3f1e2283ef956bb94fee09e80dbc1c46e9

gfs2: Remove usused cluster_wide arguments of gfs2_consist functions These arguments are always passed as 0, and they are never evaluated. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>

view details

Andreas Gruenbacher

commit sha a5ca2f1cb66b1ef93bbd58b55b22cb50163a03db

gfs2: Turn gfs2_consist into void functions Change the various gfs2_consist functions to return void. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>

view details

Andreas Gruenbacher

commit sha 8e28ef1f2fa176854f96fb0416f2aaf5516793d0

gfs2: Return bool from gfs2_assert functions The gfs2_assert functions only print messages when the filesystem hasn't been withdrawn yet, and they indicate whether or not they've printed something in their return value. However, none of the callers use that information, so simply return whether or not the assert has failed. (The gfs2_assert functions are still backwards; they return false when an assertion is true.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>

view details

Bob Peterson

commit sha 69511080bd6efd34f4e020fcde6cf73bb4a61af6

gfs2: Introduce concept of a pending withdraw File system withdraws can be delayed when inconsistencies are discovered when we cannot withdraw immediately, for example, when critical spin_locks are held. But delaying the withdraw can cause gfs2 to ignore the error and keep running for a short period of time. For example, an rgrp glock may be dequeued and demoted while there are still buffers that haven't been properly revoked, due to io errors writing to the journal. This patch introduces a new concept of a pending withdraw, which means an inconsistency has been discovered and we need to withdraw at the earliest possible opportunity. In these cases, we aren't quite withdrawn yet, but we still need to not dequeue glocks and other critical things. If we dequeue the glocks and the withdraw results in our journal being replayed, the replay could overwrite data that's been modified by a different node that acquired the glock in the meantime. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha 30fe70a85a909a23dcbc2c628ca6655b2c85e7a1

gfs2: clear ail1 list when gfs2 withdraws This patch fixes a bug in which function gfs2_log_flush can get into an infinite loop when a gfs2 file system is withdrawn. The problem is the infinite loop "for (;;)" in gfs2_log_flush which would never finish because the io error and subsequent withdraw prevented the items from being taken off the ail list. This patch tries to clean up the mess by allowing withdraw situations to move not-in-flight buffer_heads to the ail2 list, where they will be dealt with later. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha b3422cacdd7e623e473b4c3977f3ee65e1fed62f

gfs2: Rework how rgrp buffer_heads are managed Before this patch, the rgrp code had a serious problem related to how it managed buffer_heads for resource groups. The problem caused file system corruption, especially in cases of journal replay. When an rgrp glock was demoted to transfer ownership to a different cluster node, do_xmote() first calls rgrp_go_sync and then rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called gfs2_rgrp_brelse() that dropped the buffer_head reference count. In most cases, the reference count went to zero, which is right. However, there were other places where the buffers are handled differently. After rgrp_go_sync, do_xmote called rgrp_go_inval which called gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to truncate_inode_pages_range would get rid of the pages in memory, but only if the reference count drops to 0. Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL. So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer to the buffer_heads in cases where the reference count was still 1. Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time, it failed the check for "if (bi->bi_bh)" and thus failed to call brelse a second time. Because of that, the reference count on those buffers sometimes failed to drop from 1 to 0. And that caused function truncate_inode_pages_range to keep the pages in page cache rather than freeing them. The next time the rgrp glock was acquired, the metadata read of the rgrp buffers re-used the pages in memory, which were now wrong because they were likely modified by the other node who acquired the glock in EX (which is why we demoted the glock). This re-use of the page cache caused corruption because changes made by the other nodes were never seen, so the bitmaps were inaccurate. For some reason, the problem became most apparent when journal replay forced the replay of rgrps in memory, which caused newer rgrp data to be overwritten by the older in-core pages. A big part of the problem was that the rgrp buffer were released in multiple places: The go_unlock function would release them when the glock was released rather than when the glock is demoted, which is clearly wrong because our intent was to cache them until the glock is demoted from SH or EX. This patch attempts to clean up the mess and make one consistent and centralized mechanism for managing the rgrp buffer_heads by implementing several changes: 1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync. We don't want to release the buffers or zero the pointers when syncing for the reasons stated above. It only makes sense to release them when the glock is actually invalidated (go_inval). And when we do, then we set the bh pointers to NULL. 2. The go_unlock function (which was only used for rgrps) is eliminated, as we've talked about doing many times before. The go_unlock function was called too early in the glock dq process, and should not happen until the glock is invalidated. 3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd. That will now happen automatically when the rgrp glocks are demoted, and shouldn't happen any sooner or later than that. Instead, function gfs2_clear_rgrpd has been modified to demote the rgrp glocks, and therefore, free those pages, before the remaining glocks are culled by gfs2_gl_hash_clear. This prevents the gl_object from hanging around when the glocks are culled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha 036330c914365f449ead353ef152fb29411cd4cb

gfs2: log error reform Before this patch, gfs2 kept track of journal io errors in two places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags. This patch consolidates the two into sd_log_error so that it reflects the first error encountered writing to the journal. In future patches, we will take advantage of this by checking this value rather than having to check both when reacting to io errors. In addition, this fixes a tight loop in unmount: If buffers get on the ail1 list and an io error occurs elsewhere, the ail1 list would never be cleared because they were always busy. So unmount would hang, waiting for the ail1 list to empty. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha f34a6135ce723cf7940729ab0b2607a753ebb580

gfs2: Only complain the first time an io error occurs in quota or log Before this patch, all io errors received by the quota daemon or the logd daemon would cause a complaint message to be issued, such as: gfs2: fsid=dm-13.0: Error 10 writing to journal, jid=0 This patch changes it so that the error message is only issued the first time the error is encountered. Also, before this patch function gfs2_end_log_write did not set the sd_log_error value, so log errors would not cause the file system to be withdrawn. This patch sets the error code so the file system is properly withdrawn if an io error is encountered writing to the journal. WARNING: This change in function breaks check xfstests generic/441 and causes it to fail: io errors writing to the log should cause a file system to be withdrawn, and no further operations are tolerated. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha 03678a99d13831b938fdc9d81b5a7608fc6ef416

gfs2: Ignore dlm recovery requests if gfs2 is withdrawn When a node fails, user space informs dlm of the node failure, and dlm instructs gfs2 on the surviving nodes to perform journal recovery. It does this by calling various callback functions in lock_dlm.c. To mark its progress, it keeps generation numbers and recover bits in a dlm "control" lock lvb, which is seen by all nodes to determine which journals need to be replayed. The gfs2 on all nodes get the same recovery requests from dlm, so they all try to do the recovery, but only one will be granted the exclusive lock on the journal. The others fail with a "Busy" message on their "try lock." However, when a node is withdrawn, it cannot safely do any recovery or replay any journals. To make matters worse, gfs2 might withdraw as a result of attempting recovery. For example, this might happen if the device goes offline, or if an hba fails. But in today's gfs2 code, it doesn't check for being withdrawn at any step in the recovery process. What's worse is that these callbacks from dlm have no return code, so there is no way to indicate failure back to dlm. We can send a "Recovery failed" uevent eventually, but that tells user space what happened, not dlm's kernel code. Before this patch, lock_dlm would perform its recovery steps but ignore the result, and eventually it would still update its generation number in the lvb, despite the fact that it may have withdrawn or encountered an error. The other nodes would then see the newer generation number in the lvb and conclude that they don't need to do recovery because the generation number is newer than the last one they saw. They think a different node has already recovered the journal. This patch adds checks to several of the callbacks used by dlm in its recovery state machine so that the functions are ignored and skipped if an io error has occurred or if the file system is withdrawn. That prevents the lvb bits from being updated, and therefore dlm and user space still see the need for recovery to take place. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha 0d91061a372671aec1af729686ad9241a59fc328

gfs2: move check_journal_clean to util.c for future use Before this patch function check_journal_clean was in ops_fstype.c. This patch moves it to util.c so we can make use of it elsewhere in a future patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha a72d2401f54b7db41c77ab971238a06eafe929fb

gfs2: Allow some glocks to be used during withdraw We need to allow some glocks to be enqueued, dequeued, promoted, and demoted when we're withdrawn. For example, to maintain metadata integrity, we should disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like iopen or the transaction glocks may be safely used because none of their metadata goes through the journal. So in general, we should disallow all glocks with an address space, and allow all the others. One exception is: we need to allow our active journal to be demoted so others may recover it. Allowing glocks after withdraw gives us the ability to take appropriate action (in a following patch) to have our journal properly replayed by another node rather than just abandoning the current transactions and pretending nothing bad happened, leaving the other nodes free to modify the blocks we had in our journal, which may result in file system corruption. Signed-off-by: Bob Peterson <rpeterso@redhat.com>

view details

Bob Peterson

commit sha 601ef0d52e9617588fcff3df26953592f2eb44ac

gfs2: Force withdraw to replay journals and wait for it to finish When a node withdraws from a file system, it often leaves its journal in an incomplete state. This is especially true when the withdraw is caused by io errors writing to the journal. Before this patch, a withdraw would try to write a "shutdown" record to the journal, tell dlm it's done with the file system, and none of the other nodes know about the problem. Later, when the problem is fixed and the withdrawn node is rebooted, it would then discover that its own journal was incomplete, and replay it. However, replaying it at this point is almost guaranteed to introduce corruption because the other nodes are likely to have used affected resource groups that appeared in the journal since the time of the withdraw. Replaying the journal later will overwrite any changes made, and not through any fault of dlm, which was instructed during the withdraw to release those resources. This patch makes file system withdraws seen by the entire cluster. Withdrawing nodes dequeue their journal glock to allow recovery. The remaining nodes check all the journals to see if they are clean or in need of replay. They try to replay dirty journals, but only the journals of withdrawn nodes will be "not busy" and therefore available for replay. Until the journal replay is complete, no i/o related glocks may be given out, to ensure that the replay does not cause the aforementioned corruption: We cannot allow any journal replay to overwrite blocks associated with a glock once it is held. The "live" glock which is now used to signal when a withdraw occurs. When a withdraw occurs, the node signals its withdraw by dequeueing the "live" glock and trying to enqueue it in EX mode, thus forcing the other nodes to all see a demote request, by way of a "1CB" (one callback) try lock. The "live" glock is not granted in EX; the callback is only just used to indicate a withdraw has occurred. Note that all nodes in the cluster must wait for the recovering node to finish replaying the withdrawing node's journal before continuing. To this end, it checks that the journals are clean multiple times in a retry loop. Also note that the withdraw function may be called from a wide variety of situations, and therefore, we need to take extra precautions to make sure pointers are valid before using them in many circumstances. We also need to take care when glocks decide to withdraw, since the withdraw code now uses glocks. Also, before this patch, if a process encountered an error and decided to withdraw, if another process was already withdrawing, the second withdraw would be silently ignored, which set it free to unlock its glocks. That's correct behavior if the original withdrawer encounters further errors down the road. But if secondary waiters don't wait for the journal replay, unlocking glocks will allow other nodes to use them, despite the fact that the journal containing those blocks is being replayed. The replay needs to finish before our glocks are released to other nodes. IOW, secondary withdraws need to wait for the first withdraw to finish. For example, if an rgrp glock is unlocked by a process that didn't wait for the first withdraw, a journal replay could introduce file system corruption by replaying a rgrp block that has already been granted to a different cluster node. Signed-off-by: Bob Peterson <rpeterso@redhat.com>

view details

Bob Peterson

commit sha 33dbd1e41a1dd549eb19a29477119d4e29766210

gfs2: fix infinite loop when checking ail item count before go_inval Before this patch, the rgrp_go_inval and inode_go_inval functions each checked if there were any items left on the ail count (by way of a count), and if so, did a withdraw. But the withdraw code now uses glocks when changing the file system to read-only status. So we can not have glock functions withdrawing or a hang will likely result: The glocks can't be serviced by the work_func if the work_func is busy doing its own withdraw. This patch removes the checks from the go_inval functions and adds a centralized check in do_xmote to warn about the problem and not withdraw, but flag the error so it's eventually caught when the logd daemon eventually runs. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha 7d9f9249580e05a9af823cdbb7a91f2758d1c63b

gfs2: Add verbose option to check_journal_clean Before this patch, function check_journal_clean would give messages related to journal recovery. That's fine for mount time, but when a node withdraws and forces replay that way, we don't want all those distracting and misleading messages. This patch adds a new parameter to make those messages optional. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha 5e4c7632aae1cce137792647f4fb6f599d1da893

gfs2: Issue revokes more intelligently Before this patch, function gfs2_write_revokes would call gfs2_ail1_empty, then traverse the sd_ail1_list looking for transactions that had bds which were no longer queued to a glock. And if it found some, it would try to issue revokes for them, up to a predetermined maximum. There were two problems with how it did this. First was the fact that gfs2_ail1_empty moves transactions which have nothing remaining on the ail1 list from the sd_ail1_list to the sd_ail2_list, thus making its traversal of sd_ail1_list miss them completely, and therefore, never issue revokes for them. Second was the fact that there were three traversals (or partial traversals) of the sd_ail1_list, each of which took and then released the sd_ail_lock lock: First inside gfs2_ail1_empty, second to determine if there are any revokes to be issued, and third to actually issue them. All this taking and releasing of the sd_ail_lock meant other processes could modify the lists and the conditions in which we're working. This patch simplies the whole process by adding a new parameter to function gfs2_ail1_empty, max_revokes. For normal calls, this is passed in as 0, meaning we don't want to issue any revokes. For function gfs2_write_revokes, we pass in the maximum number of revokes we can, thus allowing gfs2_ail1_empty to add the revokes where needed. This simplies the code, allows for a single holding of the sd_ail_lock, and allows gfs2_ail1_empty to add revokes for all the necessary bd items without missing any. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha f05b86db314df9f31c4c21153338f6a38b1f0de7

gfs2: Prepare to withdraw as soon as an IO error occurs in log write Before this patch, function gfs2_end_log_write would detect any IO errors writing to the journal and put out an appropriate message, but it never set a withdrawing condition. Eventually, the log daemon would see the error and determine it was time to withdraw, but in the meantime, other processes could continue running as if nothing bad ever happened. The biggest consequence is that __gfs2_glock_put would BUG() when it saw that there were still unwritten items. This patch sets the WITHDRAWING status as soon as an IO error is detected, and that way, the BUG will be avoided so the file system can be properly withdrawn and unmounted. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha d93ae386ef3d1bf4d683f870ad8e838159ec451d

gfs2: Check for log write errors before telling dlm to unlock Before this patch, function do_xmote just assumed all the writes submitted to the journal were finished and successful, and it called the go_unlock function to release the dlm lock. But if they're not, and a revoke failed to make its way to the journal, a journal replay on another node will cause corruption if we let the go_inval function continue and tell dlm to release the glock to another node. This patch adds a couple checks for errors in do_xmote after the calls to go_sync and go_inval. If an error is found, we cannot withdraw yet, because the withdraw itself uses glocks to make the file system read-only. Instead, we flag the error. Later, asserts should cause another node to replay the journal before continuing, thus protecting rgrp and dinode glocks and maintaining the integrity of the metadata. Note that we only need to do this for journaled glocks. System glocks should be able to progress even under withdrawn conditions. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

Bob Peterson

commit sha 9ff78289356af640941bbb0dd3f46af2063f0046

gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty Before this patch, if gfs2_ail_empty_gl saw there was nothing on the ail list, it would return and not flush the log. The problem is that there could still be a revoke for the rgrp sitting on the sd_log_le_revoke list that's been recently taken off the ail list. But that revoke still needs to be written, and the rgrp_go_inval still needs to call log_flush_wait to ensure the revokes are all properly written to the journal before we relinquish control of the glock to another node. If we give the glock to another node before we have this knowledge, the node might crash and its journal replayed, in which case the missing revoke would allow the journal replay to replay the rgrp over top of the rgrp we already gave to another node, thus overwriting its changes and corrupting the file system. This patch makes gfs2_ail_empty_gl still call gfs2_log_flush rather than returning. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>

view details

push time in 6 days

push eventtorvalds/linux

Yu-cheng Yu

commit sha c12e13dcd814023a903399ec5ac2e7082d664b8b

x86/fpu/xstate: Fix last_good_offset in setup_xstate_features() The function setup_xstate_features() uses CPUID to find each xfeature's standard-format offset and size. Since XSAVES always uses the compacted format, supervisor xstates are *NEVER* in the standard-format and their offsets are left as -1's. However, they are still being tracked as last_good_offset. Fix it by tracking only user xstate offsets. [ bp: Use xfeature_is_supervisor() and save an indentation level. Drop now unused xfeature_is_user(). ] Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20200109211452.27369-2-yu-cheng.yu@intel.com

view details

Arvind Sankar

commit sha 48bfdb9deffdc6b683feb25e15f4f26aac503501

x86/boot/compressed/64: Use LEA to initialize boot stack pointer It's shorter, and it's what is used in every other place, so make it consistent. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200107194436.2166846-2-nivedita@alum.mit.edu

view details

Arvind Sankar

commit sha a86255fe5258714e1f7c1bdfe95f08e4d098d450

x86/boot/compressed/64: Use 32-bit (zero-extended) MOV for z_output_len z_output_len is the size of the decompressed payload (i.e. vmlinux + vmlinux.relocs) and is generated as an unsigned 32-bit quantity by mkpiggy.c. The current movq $z_output_len, %r9 instruction generates a sign-extended move to %r9. Using movl $z_output_len, %r9d will instead zero-extend into %r9, which is appropriate for an unsigned 32-bit quantity. This is also what is already done for z_input_len, the size of the compressed payload. [ bp: Also, z_output_len cannot be a 64-bit quantity because it participates in: init_size: .long INIT_SIZE # kernel initialization size through INIT_SIZE which is a 32-bit quantity determined by the .long directive (vs .quad for 64-bit). Furthermore, if it really must be a 64-bit quantity, then the insn must be MOVABS which can accommodate a 64-bit immediate and which the toolchain does not generate automatically. ] Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200211173333.1722739-1-nivedita@alum.mit.edu

view details

Yu-cheng Yu

commit sha 49a91d61aed1db01097b51a24c77137eb348a0bf

x86/fpu/xstate: Fix XSAVES offsets in setup_xstate_comp() In setup_xstate_comp(), each XSAVES component offset starts from the end of its preceding component plus alignment. A disabled feature does not take space and its offset should be set to the end of its preceding one with no alignment. However, in this case, alignment is incorrectly added to the offset, which can cause the next component to have a wrong offset. This problem has not been visible because currently there is no xfeature requiring alignment. Fix it by tracking the next starting offset only from enabled xfeatures. To make it clear, also change the function name to setup_xstate_comp_offsets(). [ bp: Fix a typo in the comment above it, while at it. ] Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20200109211452.27369-3-yu-cheng.yu@intel.com

view details

Yu-cheng Yu

commit sha e70b100806d63fb79775858ea92e1a716da46186

x86/fpu/xstate: Warn when checking alignment of disabled xfeatures An XSAVES component's alignment/offset is meaningful only when the feature is enabled. Return zero and WARN_ONCE on checking alignment of disabled features. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20200109211452.27369-4-yu-cheng.yu@intel.com

view details

Al Viro

commit sha c8e3dd86600a1a7b165478cc626d69bf07967c15

x86 user stack frame reads: switch to explicit __get_user() rather than relying upon the magic in raw_copy_from_user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Al Viro

commit sha a4814443993c7c8686036bb8ca0df6abc499d63f

x86 kvm page table walks: switch to explicit __get_user() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

view details

Martin Molnar

commit sha 4d1d0977a2156a1dafe8f1cd890ab918c803485b

x86: Fix a handful of typos Fix a couple of typos in code comments. [ bp: While at it: s/IRQ's/IRQs/. ] Signed-off-by: Martin Molnar <martin.molnar.programming@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lkml.kernel.org/r/0819a044-c360-44a4-f0b6-3f5bafe2d35c@gmail.com

view details

Benjamin Thiel

commit sha cdcb58cc05ed9a7f74509a023762488c111cdfb3

x86/iopl: Include prototype header for ksys_ioperm() .. in order to fix a -Wmissing-prototype warning. No functional change. Signed-off-by: Benjamin Thiel <b.thiel@posteo.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200123133051.5974-1-b.thiel@posteo.de

view details

Benjamin Thiel

commit sha 99ce3255fddf08fca9249488793c9b5c909ff0ab

x86/syscalls: Add prototypes for C syscall callbacks .. in order to fix a couple of -Wmissing-prototypes warnings. No functional change. [ bp: Massage commit message and drop newlines. ] Signed-off-by: Benjamin Thiel <b.thiel@posteo.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200123152754.20149-1-b.thiel@posteo.de

view details

Benjamin Thiel

commit sha b10c307f6f314c068814d0e23c86f06d5d57004b

x86/cpu: Move prototype for get_umwait_control_msr() to a global location .. in order to fix a -Wmissing-prototypes warning. No functional change. Signed-off-by: Benjamin Thiel <b.thiel@posteo.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: kvm@vger.kernel.org Link: https://lkml.kernel.org/r/20200123172945.7235-1-b.thiel@posteo.de

view details

Arvind Sankar

commit sha 3ee372ccce4d4e7c610748d0583979d3ed3a0cf4

x86/boot/compressed/64: Remove .bss/.pgtable from bzImage Commit 5b11f1cee579 ("x86, boot: straighten out ranges to copy/zero in compressed/head*.S") introduced a separate .pgtable section, splitting it out from the rest of .bss. This section was added without the writeable flag, marking it as read-only. This results in the linker putting the .rela.dyn section (containing bogus dynamic relocations from head_64.o) after the .bss and .pgtable sections. When objcopy is used to convert compressed/vmlinux into a binary for the bzImage: $ objcopy -O binary -R .note -R .comment -S arch/x86/boot/compressed/vmlinux \ arch/x86/boot/vmlinux.bin the .bss and .pgtable sections get materialized as ~176KiB of zero bytes in the binary in order to place .rela.dyn at the correct location. Fix this by marking .pgtable as writeable. This moves the .rela.dyn section up in the ELF image layout so that .bss and .pgtable are the last allocated sections and so don't appear in bzImage. [ bp: Massage commit message. ] Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20200109150218.16544-1-nivedita@alum.mit.edu

view details

Michal Simek

commit sha dcf639fe6f801f6418357ec20376b2218f666f44

microblaze: Kernel parameters should be parsed earlier Kernel command line should be parsed before cma initialization to be able to get cma sizes from command line. That's why call parse_early_param() before dma_continugous_reserve(). Unfortunately it can't be called earlier in machine_early_init() because if earlycon is passed in the command line the parse_early_param() attempts an ioremap which fails as the memory params are not set yet. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com>

view details

Arvind Sankar

commit sha 0eea39a234dc52063d14541fabcb2c64516a2328

x86/boot/compressed: Remove .eh_frame section from bzImage Discarding unnecessary sections with "*(*)" (see thread at Link: below) works fine with the bfd linker but fails with lld: $ make -j$(nproc) -s CC=clang LD=ld.lld O=out.x86_64 distclean defconfig bzImage ld.lld: error: discarding .shstrtab section is not allowed lld tries to also discard essential sections like .shstrtab, .symtab and .strtab, which results in the link failing since .shstrtab is required by the ELF specification: the e_shstrndx field in the ELF header is the index of .shstrtab, and each section in the section table is required to have an sh_name that points into the .shstrtab. .symtab and .strtab are also necessary to generate the zoffset.h file for the bzImage header. Since the only sizeable section that can be discarded is .eh_frame, restrict the discard to only .eh_frame to be safe. [ bp: Flesh out commit message and replace offending commit with this one. ] Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lkml.kernel.org/r/20200109150218.16544-2-nivedita@alum.mit.edu

view details

Dave Hansen

commit sha 16171bffc829272d5e6014bad48f680cb50943d9

x86/pkeys: Add check for pkey "overflow" Alex Shi reported the pkey macros above arch_set_user_pkey_access() to be unused. They are unused, and even refer to a nonexistent CONFIG option. But, they might have served a good use, which was to ensure that the code does not try to set values that would not fit in the PKRU register. As it stands, a too-large 'pkey' value would be likely to silently overflow the u32 new_pkru_bits. Add a check to look for overflows. Also add a comment to remind any future developer to closely examine the types used to store pkey values if arch_max_pkey() ever changes. This boots and passes the x86 pkey selftests. Reported-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200122165346.AD4DA150@viggo.jf.intel.com

view details

Michal Simek

commit sha 5119c418f950016eebe7a303e5903a239acaac09

microblaze: Fix _reset() function There is a need to disable VM before jump to zero reset vector. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com>

view details

Michal Simek

commit sha 4726dd6082bc960a20b761428eafb34b8af075b5

microblaze: Convert headers to SPDX license Covert all headers to SPDX. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de>

view details

Michal Simek

commit sha 59d85c0a369672aeb08b911c26516801672af72c

microblaze: Remove architecture tlb.h and use generic one There is no reason to have this empty file if it is just link to asm-generic/tlb.h. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de>

view details

Michal Simek

commit sha cfbd8d1979af64663571baaa389c394b5955b02d

microblaze: Remove early printk setup Early printk has been removed already that's why this setting doesn't make any sense. Also change printk level from pr_info to pr_err. Fixes: 96f0e6fcc9ad ("microblaze: remove redundant early_printk support") Signed-off-by: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de>

view details

Michal Simek

commit sha 7e8f54cd4e2628fada942fe9ba1fc46e99e94218

microblaze: Remove empty headers There is no reason to keep empty headers if asm-generic one can be used instead. That's why remove empty file and use asm-generic equivalent. cputable.h is completely unused that's why remove it. Signed-off-by: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Stefan Asserhall <stefan.asserhall@xilinx.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de>

view details

push time in 6 days

push eventtorvalds/linux

Peter Zijlstra

commit sha 43f0f97dd6f033709079d4346d384160dc3fc68f

m68k: mm: Remove stray nocache in ColdFire pgalloc Since ColdFire V4e is a software TLB-miss architecture, there is no need for page-tables to be mapped uncached. Remove this stray nocache_page() dance, which isn't paired with a cache_page() and looks like a copy/paste/edit fail. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.481739981@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Will Deacon

commit sha fd1aa6303c4d6f3c1489eadc5cc37e581c3b41ea

m68k: mm: Fix ColdFire pgd_alloc() I also notice that building for m5475evb_defconfig with vanilla v5.5 triggers this scary looking warning due to a mismatch between the pgd size and the (8k!) page size: | In function 'pgd_alloc.isra.111', | inlined from 'mm_alloc_pgd' at kernel/fork.c:634:12, | inlined from 'mm_init.isra.112' at kernel/fork.c:1043:6: | ./arch/m68k/include/asm/string.h:72:25: warning: '__builtin_memcpy' forming offset [4097, 8192] is out of the bounds [0, 4096] of object 'kernel_pg_dir' with type 'pgd_t[1024]' {aka 'struct <anonymous>[1024]'} [-Warray-bounds] | #define memcpy(d, s, n) __builtin_memcpy(d, s, n) | ^~~~~~~~~~~~~~~~~~~~~~~~~ | ./arch/m68k/include/asm/mcf_pgalloc.h:93:2: note: in expansion of macro 'memcpy' | memcpy(new_pgd, swapper_pg_dir, PAGE_SIZE); | ^~~~~~ Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.540057688@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Peter Zijlstra

commit sha 13076a29d52e91d29ab6b13e7279c9eacd0b6dbb

m68k: mm: Unify Motorola MMU page setup Seeing how there are 5 copies of this magic code, one of which is unexplainably different, unify and document things. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.597688427@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Peter Zijlstra

commit sha 5ad272abee9fe0a781d49b03f334c0a3b3e418c1

m68k: mm: Move the pointer table allocator to motorola.c Only the Motorola MMU makes use of this allocator, it is a waste of .text to include it for Sun3/ColdFire. Also, this is going to avoid build issues when we're going to make it more Motorola specific. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.654652162@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Peter Zijlstra

commit sha ef22d8abd876e805b604e8f655127de2beee2869

m68k: mm: Restructure Motorola MMU page-table layout The Motorola 68xxx MMUs, 040 (and later) have a fixed 7,7,{5,6} page-table setup, where the last depends on the page-size selected (8k vs 4k resp.), and head.S selects 4K pages. For 030 (and earlier) we explicitly program 7,7,6 and 4K pages in %tc. However, the current code implements this mightily weird. What it does is group 16 of those (6 bit) pte tables into one 4k page to not waste space. The down-side is that that forces pmd_t to be a 16-tuple pointing to consecutive pte tables. This breaks the generic code which assumes READ_ONCE(*pmd) will be word sized. Therefore implement a straight forward 7,7,6 3 level page-table setup, with the addition (for 020/030) of (partial) large-page support. For now this increases the memory footprint for pte-tables 15 fold. Tested with ARAnyM/68040 emulation. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.711478295@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Peter Zijlstra

commit sha ef9285f69f0efbc75d01cbb09fe65882effd0a25

m68k: mm: Improve kernel_page_table() With the PTE-tables now only being 256 bytes, allocating a full page for them is a giant waste. Start by improving the boot time allocator such that init_mm initialization will at least have optimal memory density. Much thanks to Will Deacon in help with debugging and ferreting out lost information on these dusty MMUs. Notes: - _TABLE_MASK is reduced to account for the shorter (256 byte) alignment of pte-tables, per the manual, table entries should only ever have state in the low 4 bits (Used,WrProt,Desc1,Desc0) so it is still longer than strictly required. (Thanks Will!!!) - Also use kernel_page_table() for the 020/030 zero_pgtable case and consequently remove the zero_pgtable init hack (will fix up later). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.768263973@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Peter Zijlstra

commit sha 61c64a25ae8df45c2cd2f76343e20c3d266382ea

m68k: mm: Use table allocator for pgtables With the new page-table layout, using full (4k) pages for (256 byte) pte-tables is immensely wastefull. Move the pte-tables over to the same allocator already used for the (512 byte) higher level tables (pgd/pmd). This reduces the pte-table waste from 15x to 2x. Due to no longer being bound to 16 consecutive tables, this might actually already be more efficient than the old code for sparse tables. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.825295149@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Peter Zijlstra

commit sha 0e071ee6815692a3b241bbe9a9a29f7cdec023ed

m68k: mm: Extend table allocator for multiple sizes In addition to the PGD/PMD table size (128*4) add a PTE table size (64*4) to the table allocator. This completely removes the pte-table overhead compared to the old code, even for dense tables. Notes: - the allocator gained a list_empty() check to deal with there not being any pages at all. - the free mask is extended to cover more than the 8 bits required for the (512 byte) PGD/PMD tables. - NR_PAGETABLE accounting is restored. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.882175409@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Peter Zijlstra

commit sha 518a6b58243aff43361c1f027697dfe5a89cbc4f

m68k: mm: Fully initialize the page-table allocator Also iterate the PMD tables to populate the PTE table allocator. This also fully replaces the previous zero_pgtable hack. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.938797587@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Will Deacon

commit sha de9e354e1f8f975092e6635a0596cf0b9a6b82ac

m68k: mm: Change ColdFire pgtable_t To match what we did to the Motorola MMU routines, change the ColdFire pgalloc. The result is that ColdFire and Sun3 pgalloc are actually very similar and could conceivably be unified. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://lore.kernel.org/r/20200131125403.995781825@infradead.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

view details

Geert Uytterhoeven

commit sha d2936bd02b196f1c12bca1175e8fb09fefdfa47f

MIPS: ath79: Replace <linux/clk-provider.h> by <linux/of_clk.h> The Atheros 7/9xxx platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Paul Cercueil <paul@crapouillou.net> Cc: James Hartley <james.hartley@sondrel.com> Cc: John Crispin <john@phrozen.org> Cc: linux-mips@vger.kernel.org Cc: bcm-kernel-feedback-list@broadcom.com Cc: linux-clk@vger.kernel.org Cc: linux-kernel@vger.kernel.org

view details

Geert Uytterhoeven

commit sha e40b3deff7aff22666a3f18c175f4a0f9da79c35

MIPS: BMIPS: Replace <linux/clk-provider.h> by <linux/of_clk.h> The Broadcom BMIPS platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Cercueil <paul@crapouillou.net> Cc: James Hartley <james.hartley@sondrel.com> Cc: John Crispin <john@phrozen.org> Cc: linux-mips@vger.kernel.org Cc: bcm-kernel-feedback-list@broadcom.com Cc: linux-clk@vger.kernel.org Cc: linux-kernel@vger.kernel.org

view details

Geert Uytterhoeven

commit sha 089a792c750dde659169c0b7ca9f1d056ee44c89

MIPS: generic: Replace <linux/clk-provider.h> by <linux/of_clk.h> The generic MIPS platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Paul Cercueil <paul@crapouillou.net> Cc: James Hartley <james.hartley@sondrel.com> Cc: John Crispin <john@phrozen.org> Cc: linux-mips@vger.kernel.org Cc: bcm-kernel-feedback-list@broadcom.com Cc: linux-clk@vger.kernel.org Cc: linux-kernel@vger.kernel.org

view details

Geert Uytterhoeven

commit sha 3a94afc689475691bdddc664b33c1ffcbfb00263

MIPS: jz4740: Replace <linux/clk-provider.h> by <linux/of_clk.h> The Ingenic JZ4740 platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Reviewed-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: James Hartley <james.hartley@sondrel.com> Cc: John Crispin <john@phrozen.org> Cc: linux-mips@vger.kernel.org Cc: bcm-kernel-feedback-list@broadcom.com Cc: linux-clk@vger.kernel.org Cc: linux-kernel@vger.kernel.org

view details

Geert Uytterhoeven

commit sha 071cec1bfe1f5158ce8254c7cf2dc0be973efa84

MIPS: pic32mzda: Replace <linux/clk-provider.h> by <linux/of_clk.h> The Microchip PIC32MZDA platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Paul Cercueil <paul@crapouillou.net> Cc: James Hartley <james.hartley@sondrel.com> Cc: John Crispin <john@phrozen.org> Cc: linux-mips@vger.kernel.org Cc: bcm-kernel-feedback-list@broadcom.com Cc: linux-clk@vger.kernel.org Cc: linux-kernel@vger.kernel.org

view details

Geert Uytterhoeven

commit sha 97e04ea15fd5324d9e10343fdece698b26365d82

MIPS: Pistachio: Replace <linux/clk-provider.h> by <linux/of_clk.h> The Pistachio platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Acked-by: James Hartley <james.hartley@sondrel.com> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Paul Cercueil <paul@crapouillou.net> Cc: John Crispin <john@phrozen.org> Cc: linux-mips@vger.kernel.org Cc: bcm-kernel-feedback-list@broadcom.com Cc: linux-clk@vger.kernel.org Cc: linux-kernel@vger.kernel.org

view details

Geert Uytterhoeven

commit sha 9926108f799aaae0acffaf3dc7f62063423b25ea

MIPS: ralink: Replace <linux/clk-provider.h> by <linux/of_clk.h> The Ralink platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Acked-by: John Crispin <john@phrozen.org> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Paul Cercueil <paul@crapouillou.net> Cc: James Hartley <james.hartley@sondrel.com> Cc: linux-mips@vger.kernel.org Cc: bcm-kernel-feedback-list@broadcom.com Cc: linux-clk@vger.kernel.org Cc: linux-kernel@vger.kernel.org

view details

Krzysztof Kozlowski

commit sha f6541f347bba6edbcbb1c930f802bb80b0c56468

MIPS: configs: Cleanup old Kconfig options CONFIG_MTD_NAND_IDS is gone and not needed (part of CONFIG_MTD_NAND) since commit f16bd7ca0457 ("mtd: nand: Kill the MTD_NAND_IDS Kconfig option"). CONFIG_IOSCHED_DEADLINE, CONFIG_IOSCHED_CFQ and CONFIG_DEFAULT_NOOP are gone since commit f382fb0bcef4 ("block: remove legacy IO schedulers"). The IOSCHED_DEADLINE was replaced by MQ_IOSCHED_DEADLINE and it will be now enabled by default (along with MQ_IOSCHED_KYBER). The BFQ_GROUP_IOSCHED is the only multiqueue scheduler which comes with group scheduling so select it in configs previously choosing CFQ_GROUP_IOSCHED. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: bcm-kernel-feedback-list@broadcom.com Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org

view details

Finn Thain

commit sha c584f9532115aecf24ba126dd5c528c994b6950f

fbdev/g364fb: Fix build failure This patch resolves these compiler errors and warnings -- CC drivers/video/fbdev/g364fb.o drivers/video/fbdev/g364fb.c: In function 'g364fb_cursor': drivers/video/fbdev/g364fb.c:137:9: error: 'x' undeclared (first use in this function) drivers/video/fbdev/g364fb.c:137:9: note: each undeclared identifier is reported only once for each function it appears in drivers/video/fbdev/g364fb.c:137:7: error: implicit declaration of function 'fontwidth' [-Werror=implicit-function-declaration] drivers/video/fbdev/g364fb.c:137:23: error: 'p' undeclared (first use in this function) drivers/video/fbdev/g364fb.c:137:38: error: 'y' undeclared (first use in this function) drivers/video/fbdev/g364fb.c:137:7: error: implicit declaration of function 'fontheight' [-Werror=implicit-function-declaration] drivers/video/fbdev/g364fb.c: In function 'g364fb_init': drivers/video/fbdev/g364fb.c:233:24: error: 'fbvar' undeclared (first use in this function) drivers/video/fbdev/g364fb.c:234:24: error: 'xres' undeclared (first use in this function) drivers/video/fbdev/g364fb.c:201:14: warning: unused variable 'j' [-Wunused-variable] drivers/video/fbdev/g364fb.c:197:25: warning: unused variable 'pal_ptr' [-Wunused-variable] The MIPS Magnum framebuffer console now works when tested in QEMU. Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org

view details

Finn Thain

commit sha a7047b8dd0980fb3bc803818aeeb686b6a33d1d5

mips/jazz: Remove redundant settings and shrink jazz_defconfig Remove some redundant assignments, that have no effect on 'make jazz_defconfig': CONFIG_INET_XFRM_MODE_TRANSPORT=m CONFIG_INET_XFRM_MODE_TUNNEL=m CONFIG_CRYPTO_HMAC=y Also drop the settings relating to crypto, wireless, advanced networking etc. The Kconfig defaults for these options are fine. This reduces the size of vmlinux so it can be launched by "NetBSD/arc Bootstrap, Revision 1.1", which is conveniently available on NetBSD/arc 5.1 ISO images. Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Paul Burton <paulburton@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org

view details

push time in 6 days

push eventtorvalds/linux

Thomas Gleixner

commit sha 7c805795307b40af50a45b7db44dd09ac1700947

x86/entry: Remove _TIF_NOHZ from _TIF_WORK_SYSCALL_ENTRY Evaluating _TIF_NOHZ to decide whether to use the slow syscall entry path is not only pointless, it's actually counterproductive: 1) Context tracking code is invoked unconditionally before that flag is evaluated. 2) If the flag is set the slow path is invoked for nothing due to #1 Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

view details

Frederic Weisbecker

commit sha 490f561b783dac2c4825e288e6dbbf83481eea34

context-tracking: Introduce CONFIG_HAVE_TIF_NOHZ A few archs (x86, arm, arm64) don't rely anymore on TIF_NOHZ to call into context tracking on user entry/exit but instead use static keys (or not) to optimize those calls. Ideally every arch should migrate to that behaviour in the long run. Settle a config option to let those archs remove their TIF_NOHZ definitions. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paulburton@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net>

view details

Frederic Weisbecker

commit sha 68d875131e43a8eed193810c1dfebb027fe2450e

x86: Remove TIF_NOHZ Static keys have replaced TIF_NOHZ to optimize the calls to context tracking. We can now safely remove that thread flag. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org>

view details

Frederic Weisbecker

commit sha 1acb2249ee380f56ffb4200ac85bfe233d7a8ca3

arm: Remove TIF_NOHZ Arm entry code calls context tracking from fast path. TIF_NOHZ is unused and can be safely removed. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@armlinux.org.uk>

view details

Thomas Gleixner

commit sha 50e818715821b89c7abac90a97721f106e893d83

x86/vdso: Mark the TSC clocksource path likely Jumping out of line for the TSC clcoksource read is creating awful code. TSC is likely to be the clocksource at least on bare metal and the PV interfaces are sufficiently more work that the jump over the TSC read is just in the noise. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124402.328922847@linutronix.de

view details

Thomas Gleixner

commit sha 78560d41064ad3d377e3d1a1ee87526301f4e946

ARM: vdso: Remove unused function The function is nowhere used. Aside of that this check should only cover the high resolution parts of the VDSO which require a VDSO capable clocksource and not the complete functionality as the name suggests. Will be replaced with something more useful. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124402.438179009@linutronix.de

view details

Thomas Gleixner

commit sha 1dff4156d1f63b525c54aea7f097a657cbbbf837

lib/vdso: Allow the high resolution parts to be compiled out If the architecture knows at compile time that there is no VDSO capable clocksource supported it makes sense to optimize the guts of the high resolution parts of the VDSO out at build time. Add a helper function to check whether the VDSO should be high resolution capable and provide a stub which can be overridden by an architecture. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124402.530143168@linutronix.de

view details

Thomas Gleixner

commit sha 3280badbe1b289622ce12b94d064ddb624cbaef1

ARM: vdso: Compile high resolution parts conditionally If the architected timer is disabled in the kernel configuration then let the core VDSO code drop the high resolution parts at compile time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124402.622587341@linutronix.de

view details

Thomas Gleixner

commit sha 25a2a6567829119f5e3e11eb0ce3d8ae985b6019

MIPS: vdso: Compile high resolution parts conditionally If neither the R4K nor the GIC timer is enabled in the kernel configuration then let the core VDSO code drop the high resolution parts at compile time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124402.714585315@linutronix.de

view details

Thomas Gleixner

commit sha 3bd142a46b561a12408e8db78cc6d62eb1c6b84e

clocksource: Cleanup struct clocksource and documentation Reformat the struct definition, add missing member documentation. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124402.825471920@linutronix.de

view details

Thomas Gleixner

commit sha eec399dd862762b9594df3659f15839a4e12f17a

x86/vdso: Move VDSO clocksource state tracking to callback All architectures which use the generic VDSO code have their own storage for the VDSO clock mode. That's pointless and just requires duplicate code. X86 abuses the function which retrieves the architecture specific clock mode storage to mark the clocksource as used in the VDSO. That's silly because this is invoked on every tick when the VDSO data is updated. Move this functionality to the clocksource::enable() callback so it gets invoked once when the clocksource is installed. This allows to make the clock mode storage generic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> (Hyper-V parts) Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> (VDSO parts) Acked-by: Juergen Gross <jgross@suse.com> (Xen parts) Link: https://lkml.kernel.org/r/20200207124402.934519777@linutronix.de

view details

Thomas Gleixner

commit sha 5d51bee725cc1497352d6b0b604e42a90c680540

clocksource: Add common vdso clock mode storage All architectures which use the generic VDSO code have their own storage for the VDSO clock mode. That's pointless and just requires duplicate code. Provide generic storage for it. The new Kconfig symbol is intermediate and will be removed once all architectures are converted over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124403.028046322@linutronix.de

view details

Thomas Gleixner

commit sha b95a8a27c300d1a39a4e36f63a518ef36e4b966c

x86/vdso: Use generic VDSO clock mode storage Switch to the generic VDSO clock mode storage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> (VDSO parts) Acked-by: Juergen Gross <jgross@suse.com> (Xen parts) Acked-by: Paolo Bonzini <pbonzini@redhat.com> (KVM parts) Link: https://lkml.kernel.org/r/20200207124403.152039903@linutronix.de

view details

Thomas Gleixner

commit sha e1bdb22ebe5363ed75ddedf836ca9f19e1195337

mips: vdso: Use generic VDSO clock mode storage Switch to the generic VDSO clock mode storage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124403.244684017@linutronix.de

view details

Thomas Gleixner

commit sha 5e3c6a312a0946d2d83e32359612cbb925a8bed0

ARM/arm64: vdso: Use common vdso clock mode storage Convert ARM/ARM64 to the generic VDSO clock mode storage. This needs to happen in one go as they share the clocksource driver. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124403.363235229@linutronix.de

view details

Thomas Gleixner

commit sha f86fd32db706613fe8d0104057efa6e83e0d7e8f

lib/vdso: Cleanup clock mode storage leftovers Now that all architectures are converted to use the generic storage the helpers and conditionals can be removed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124403.470699892@linutronix.de

view details

Thomas Gleixner

commit sha c7a18100bdffdff440c7291db6e80863fab0461e

lib/vdso: Avoid highres update if clocksource is not VDSO capable If the current clocksource is not VDSO capable there is no point in updating the high resolution parts of the VDSO data. Replace the architecture specific check with a check for a VDSO capable clocksource and skip the update if there is none. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124403.563379423@linutronix.de

view details

Thomas Gleixner

commit sha 2d6b01bd88ccabba06d342ef80eaab6b39d12497

lib/vdso: Move VCLOCK_TIMENS to vdso_clock_modes Move the time namespace indicator clock mode to the other ones for consistency sake. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124403.656097274@linutronix.de

view details

Christophe Leroy

commit sha ae12e08539de6717502c2f9f83bd60df939b5c08

lib/vdso: Allow fixed clock mode Some architectures have a fixed clocksource which is known at compile time and cannot be replaced or disabled at runtime, e.g. timebase on PowerPC. For such cases the clock mode check in the VDSO code is pointless. Move the check for a VDSO capable clocksource into an inline function and allow architectures to redefine it via a macro. [ tglx: Removed the #ifdef mess ] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lkml.kernel.org/r/20200207124403.748756829@linutronix.de

view details

Christophe Leroy

commit sha 8345228ccf31f94e3ff7ec5458ac7cc13cb323fa

lib/vdso: Allow architectures to override the ns shift operation On powerpc/32, GCC (8.1) generates pretty bad code for the ns >>= vd->shift operation taking into account that the shift is always <= 32 and the upper part of the result is likely to be zero. GCC makes reversed assumptions considering the shift to be likely >= 32 and the upper part to be like not zero. unsigned long long shift(unsigned long long x, unsigned char s) { return x >> s; } results in: 00000018 <shift>: 18: 35 25 ff e0 addic. r9,r5,-32 1c: 41 80 00 10 blt 2c <shift+0x14> 20: 7c 64 4c 30 srw r4,r3,r9 24: 38 60 00 00 li r3,0 28: 4e 80 00 20 blr 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 30: 21 45 00 1f subfic r10,r5,31 34: 7c 84 2c 30 srw r4,r4,r5 38: 7d 29 50 30 slw r9,r9,r10 3c: 7c 63 2c 30 srw r3,r3,r5 40: 7d 24 23 78 or r4,r9,r4 44: 4e 80 00 20 blr Even when forcing the shift to be smaller than 32 with an &= 31, it still considers the shift as likely >= 32. Move the default shift implementation into an inline which can be redefined in architecture code via a macro. [ tglx: Made the shift argument u32 and removed the __arch prefix ] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/b3d449de856982ed060a71e6ace8eeca4654e685.1580399657.git.christophe.leroy@c-s.fr Link: https://lkml.kernel.org/r/20200207124403.857649978@linutronix.de

view details

push time in 7 days

push eventtorvalds/linux

Giovanni Gherdovich

commit sha 1567c3e3467cddeb019a7b53ec632f834b6a9239

x86, sched: Add support for frequency invariance Implement arch_scale_freq_capacity() for 'modern' x86. This function is used by the scheduler to correctly account usage in the face of DVFS. The present patch addresses Intel processors specifically and has positive performance and performance-per-watt implications for the schedutil cpufreq governor, bringing it closer to, if not on-par with, the powersave governor from the intel_pstate driver/framework. Large performance gains are obtained when the machine is lightly loaded and no regression are observed at saturation. The benchmarks with the largest gains are kernel compilation, tbench (the networking version of dbench) and shell-intensive workloads. 1. FREQUENCY INVARIANCE: MOTIVATION * Without it, a task looks larger if the CPU runs slower 2. PECULIARITIES OF X86 * freq invariance accounting requires knowing the ratio freq_curr/freq_max 2.1 CURRENT FREQUENCY * Use delta_APERF / delta_MPERF * freq_base (a.k.a "BusyMHz") 2.2 MAX FREQUENCY * It varies with time (turbo). As an approximation, we set it to a constant, i.e. 4-cores turbo frequency. 3. EFFECTS ON THE SCHEDUTIL FREQUENCY GOVERNOR * The invariant schedutil's formula has no feedback loop and reacts faster to utilization changes 4. KNOWN LIMITATIONS * In some cases tasks can't reach max util despite how hard they try 5. PERFORMANCE TESTING 5.1 MACHINES * Skylake, Broadwell, Haswell 5.2 SETUP * baseline Linux v5.2 w/ non-invariant schedutil. Tested freq_max = 1-2-3-4-8-12 active cores turbo w/ invariant schedutil, and intel_pstate/powersave 5.3 BENCHMARK RESULTS 5.3.1 NEUTRAL BENCHMARKS * NAS Parallel Benchmark (HPC), hackbench 5.3.2 NON-NEUTRAL BENCHMARKS * tbench (10-30% better), kernbench (10-15% better), shell-intensive-scripts (30-50% better) * no regressions 5.3.3 SELECTION OF DETAILED RESULTS 5.3.4 POWER CONSUMPTION, PERFORMANCE-PER-WATT * dbench (5% worse on one machine), kernbench (3% worse), tbench (5-10% better), shell-intensive-scripts (10-40% better) 6. MICROARCH'ES ADDRESSED HERE * Xeon Core before Scalable Performance processors line (Xeon Gold/Platinum etc have different MSRs semantic for querying turbo levels) 7. REFERENCES * MMTests performance testing framework, github.com/gormanm/mmtests +-------------------------------------------------------------------------+ | 1. FREQUENCY INVARIANCE: MOTIVATION +-------------------------------------------------------------------------+ For example; suppose a CPU has two frequencies: 500 and 1000 Mhz. When running a task that would consume 1/3rd of a CPU at 1000 MHz, it would appear to consume 2/3rd (or 66.6%) when running at 500 MHz, giving the false impression this CPU is almost at capacity, even though it can go faster [*]. In a nutshell, without frequency scale-invariance tasks look larger just because the CPU is running slower. [*] (footnote: this assumes a linear frequency/performance relation; which everybody knows to be false, but given realities its the best approximation we can make.) +-------------------------------------------------------------------------+ | 2. PECULIARITIES OF X86 +-------------------------------------------------------------------------+ Accounting for frequency changes in PELT signals requires the computation of the ratio freq_curr / freq_max. On x86 neither of those terms is readily available. 2.1 CURRENT FREQUENCY ==================== Since modern x86 has hardware control over the actual frequency we run at (because amongst other things, Turbo-Mode), we cannot simply use the frequency as requested through cpufreq. Instead we use the APERF/MPERF MSRs to compute the effective frequency over the recent past. Also, because reading MSRs is expensive, don't do so every time we need the value, but amortize the cost by doing it every tick. 2.2 MAX FREQUENCY ================= Obtaining freq_max is also non-trivial because at any time the hardware can provide a frequency boost to a selected subset of cores if the package has enough power to spare (eg: Turbo Boost). This means that the maximum frequency available to a given core changes with time. The approach taken in this change is to arbitrarily set freq_max to a constant value at boot. The value chosen is the "4-cores (4C) turbo frequency" on most microarchitectures, after evaluating the following candidates: * 1-core (1C) turbo frequency (the fastest turbo state available) * around base frequency (a.k.a. max P-state) * something in between, such as 4C turbo To interpret these options, consider that this is the denominator in freq_curr/freq_max, and that ratio will be used to scale PELT signals such as util_avg and load_avg. A large denominator will undershoot (util_avg looks a bit smaller than it really is), viceversa with a smaller denominator PELT signals will tend to overshoot. Given that PELT drives frequency selection in the schedutil governor, we will have: freq_max set to | effect on DVFS --------------------+------------------ 1C turbo | power efficiency (lower freq choices) base freq | performance (higher util_avg, higher freq requests) 4C turbo | a bit of both 4C turbo proves to be a good compromise in a number of benchmarks (see below). +-------------------------------------------------------------------------+ | 3. EFFECTS ON THE SCHEDUTIL FREQUENCY GOVERNOR +-------------------------------------------------------------------------+ Once an architecture implements a frequency scale-invariant utilization (the PELT signal util_avg), schedutil switches its frequency selection formula from freq_next = 1.25 * freq_curr * util [non-invariant util signal] to freq_next = 1.25 * freq_max * util [invariant util signal] where, in the second formula, freq_max is set to the 1C turbo frequency (max turbo). The advantage of the second formula, whose usage we unlock with this patch, is that freq_next doesn't depend on the current frequency in an iterative fashion, but can jump to any frequency in a single update. This absence of feedback in the formula makes it quicker to react to utilization changes and more robust against pathological instabilities. Compare it to the update formula of intel_pstate/powersave: freq_next = 1.25 * freq_max * Busy% where again freq_max is 1C turbo and Busy% is the percentage of time not spent idling (calculated with delta_MPERF / delta_TSC); essentially the same as invariant schedutil, and largely responsible for intel_pstate/powersave good reputation. The non-invariant schedutil formula is derived from the invariant one by approximating util_inv with util_raw * freq_curr / freq_max, but this has limitations. Testing shows improved performances due to better frequency selections when the machine is lightly loaded, and essentially no change in behaviour at saturation / overutilization. +-------------------------------------------------------------------------+ | 4. KNOWN LIMITATIONS +-------------------------------------------------------------------------+ It's been shown that it is possible to create pathological scenarios where a CPU-bound task cannot reach max utilization, if the normalizing factor freq_max is fixed to a constant value (see [Lelli-2018]). If freq_max is set to 4C turbo as we do here, one needs to peg at least 5 cores in a package doing some busywork, and observe that none of those task will ever reach max util (1024) because they're all running at less than the 4C turbo frequency. While this concern still applies, we believe the performance benefit of frequency scale-invariant PELT signals outweights the cost of this limitation. [Lelli-2018] https://lore.kernel.org/lkml/20180517150418.GF22493@localhost.localdomain/ +-------------------------------------------------------------------------+ | 5. PERFORMANCE TESTING +-------------------------------------------------------------------------+ 5.1 MACHINES ============ We tested the patch on three machines, with Skylake, Broadwell and Haswell CPUs. The details are below, together with the available turbo ratios as reported by the appropriate MSRs. * 8x-SKYLAKE-UMA: Single socket E3-1240 v5, Skylake 4 cores/8 threads Max EFFiciency, BASE frequency and available turbo levels (MHz): EFFIC 800 |******** BASE 3500 |*********************************** 4C 3700 |************************************* 3C 3800 |************************************** 2C 3900 |*************************************** 1C 3900 |*************************************** * 80x-BROADWELL-NUMA: Two sockets E5-2698 v4, 2x Broadwell 20 cores/40 threads Max EFFiciency, BASE frequency and available turbo levels (MHz): EFFIC 1200 |************ BASE 2200 |********************** 8C 2900 |***************************** 7C 3000 |****************************** 6C 3100 |******************************* 5C 3200 |******************************** 4C 3300 |********************************* 3C 3400 |********************************** 2C 3600 |************************************ 1C 3600 |************************************ * 48x-HASWELL-NUMA Two sockets E5-2670 v3, 2x Haswell 12 cores/24 threads Max EFFiciency, BASE frequency and available turbo levels (MHz): EFFIC 1200 |************ BASE 2300 |*********************** 12C 2600 |************************** 11C 2600 |************************** 10C 2600 |************************** 9C 2600 |************************** 8C 2600 |************************** 7C 2600 |************************** 6C 2600 |************************** 5C 2700 |*************************** 4C 2800 |**************************** 3C 2900 |***************************** 2C 3100 |******************************* 1C 3100 |******************************* 5.2 SETUP ========= * The baseline is Linux v5.2 with schedutil (non-invariant) and the intel_pstate driver in passive mode. * The rationale for choosing the various freq_max values to test have been to try all the 1-2-3-4C turbo levels (note that 1C and 2C turbo are identical on all machines), plus one more value closer to base_freq but still in the turbo range (8C turbo for both 80x-BROADWELL-NUMA and 48x-HASWELL-NUMA). * In addition we've run all tests with intel_pstate/powersave for comparison. * The filesystem is always XFS, the userspace is openSUSE Leap 15.1. * 8x-SKYLAKE-UMA is capable of HWP (Hardware-Managed P-States), so the runs with active intel_pstate on this machine use that. This gives, in terms of combinations tested on each machine: * 8x-SKYLAKE-UMA * Baseline: Linux v5.2, non-invariant schedutil, intel_pstate passive * intel_pstate active + powersave + HWP * invariant schedutil, freq_max = 1C turbo * invariant schedutil, freq_max = 3C turbo * invariant schedutil, freq_max = 4C turbo * both 80x-BROADWELL-NUMA and 48x-HASWELL-NUMA * [same as 8x-SKYLAKE-UMA, but no HWP capable] * invariant schedutil, freq_max = 8C turbo (which on 48x-HASWELL-NUMA is the same as 12C turbo, or "all cores turbo") 5.3 BENCHMARK RESULTS ===================== 5.3.1 NEUTRAL BENCHMARKS ------------------------ Tests that didn't show any measurable difference in performance on any of the test machines between non-invariant schedutil and our patch are: * NAS Parallel Benchmarks (NPB) using either MPI or openMP for IPC, any computational kernel * flexible I/O (FIO) * hackbench (using threads or processes, and using pipes or sockets) 5.3.2 NON-NEUTRAL BENCHMARKS ---------------------------- What follow are summary tables where each benchmark result is given a score. * A tilde (~) means a neutral result, i.e. no difference from baseline. * Scores are computed with the ratio result_new / result_baseline, so a tilde means a score of 1.00. * The results in the score ratio are the geometric means of results running the benchmark with different parameters (eg: for kernbench: using 1, 2, 4, ... number of processes; for pgbench: varying the number of clients, and so on). * The first three tables show higher-is-better kind of tests (i.e. measured in operations/second), the subsequent three show lower-is-better kind of tests (i.e. the workload is fixed and we measure elapsed time, think kernbench). * "gitsource" is a name we made up for the test consisting in running the entire unit tests suite of the Git SCM and measuring how long it takes. We take it as a typical example of shell-intensive serialized workload. * In the "I_PSTATE" column we have the results for intel_pstate/powersave. Other columns show invariant schedutil for different values of freq_max. 4C turbo is circled as it's the value we've chosen for the final implementation. 80x-BROADWELL-NUMA (comparison ratio; higher is better) +------+ I_PSTATE 1C 3C | 4C | 8C pgbench-ro 1.14 ~ ~ | 1.11 | 1.14 pgbench-rw ~ ~ ~ | ~ | ~ netperf-udp 1.06 ~ 1.06 | 1.05 | 1.07 netperf-tcp ~ 1.03 ~ | 1.01 | 1.02 tbench4 1.57 1.18 1.22 | 1.30 | 1.56 +------+ 8x-SKYLAKE-UMA (comparison ratio; higher is better) +------+ I_PSTATE/HWP 1C 3C | 4C | pgbench-ro ~ ~ ~ | ~ | pgbench-rw ~ ~ ~ | ~ | netperf-udp ~ ~ ~ | ~ | netperf-tcp ~ ~ ~ | ~ | tbench4 1.30 1.14 1.14 | 1.16 | +------+ 48x-HASWELL-NUMA (comparison ratio; higher is better) +------+ I_PSTATE 1C 3C | 4C | 12C pgbench-ro 1.15 ~ ~ | 1.06 | 1.16 pgbench-rw ~ ~ ~ | ~ | ~ netperf-udp 1.05 0.97 1.04 | 1.04 | 1.02 netperf-tcp 0.96 1.01 1.01 | 1.01 | 1.01 tbench4 1.50 1.05 1.13 | 1.13 | 1.25 +------+ In the table above we see that active intel_pstate is slightly better than our 4C-turbo patch (both in reference to the baseline non-invariant schedutil) on read-only pgbench and much better on tbench. Both cases are notable in which it shows that lowering our freq_max (to 8C-turbo and 12C-turbo on 80x-BROADWELL-NUMA and 48x-HASWELL-NUMA respectively) helps invariant schedutil to get closer. If we ignore active intel_pstate and focus on the comparison with baseline alone, there are several instances of double-digit performance improvement. 80x-BROADWELL-NUMA (comparison ratio; lower is better) +------+ I_PSTATE 1C 3C | 4C | 8C dbench4 1.23 0.95 0.95 | 0.95 | 0.95 kernbench 0.93 0.83 0.83 | 0.83 | 0.82 gitsource 0.98 0.49 0.49 | 0.49 | 0.48 +------+ 8x-SKYLAKE-UMA (comparison ratio; lower is better) +------+ I_PSTATE/HWP 1C 3C | 4C | dbench4 ~ ~ ~ | ~ | kernbench ~ ~ ~ | ~ | gitsource 0.92 0.55 0.55 | 0.55 | +------+ 48x-HASWELL-NUMA (comparison ratio; lower is better) +------+ I_PSTATE 1C 3C | 4C | 8C dbench4 ~ ~ ~ | ~ | ~ kernbench 0.94 0.90 0.89 | 0.90 | 0.90 gitsource 0.97 0.69 0.69 | 0.69 | 0.69 +------+ dbench is not very remarkable here, unless we notice how poorly active intel_pstate is performing on 80x-BROADWELL-NUMA: 23% regression versus non-invariant schedutil. We repeated that run getting consistent results. Out of scope for the patch at hand, but deserving future investigation. Other than that, we previously ran this campaign with Linux v5.0 and saw the patch doing better on dbench a the time. We haven't checked closely and can only speculate at this point. On the NUMA boxes kernbench gets 10-15% improvements on average; we'll see in the detailed tables that the gains concentrate on low process counts (lightly loaded machines). The test we call "gitsource" (running the git unit test suite, a long-running single-threaded shell script) appears rather spectacular in this table (gains of 30-50% depending on the machine). It is to be noted, however, that gitsource has no adjustable parameters (such as the number of jobs in kernbench, which we average over in order to get a single-number summary score) and is exactly the kind of low-parallelism workload that benefits the most from this patch. When looking at the detailed tables of kernbench or tbench4, at low process or client counts one can see similar numbers. 5.3.3 SELECTION OF DETAILED RESULTS ----------------------------------- Machine : 48x-HASWELL-NUMA Benchmark : tbench4 (i.e. dbench4 over the network, actually loopback) Varying parameter : number of clients Unit : MB/sec (higher is better) 5.2.0 vanilla (BASELINE) 5.2.0 intel_pstate 5.2.0 1C-turbo - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Hmean 1 126.73 +- 0.31% ( ) 315.91 +- 0.66% ( 149.28%) 125.03 +- 0.76% ( -1.34%) Hmean 2 258.04 +- 0.62% ( ) 614.16 +- 0.51% ( 138.01%) 269.58 +- 1.45% ( 4.47%) Hmean 4 514.30 +- 0.67% ( ) 1146.58 +- 0.54% ( 122.94%) 533.84 +- 1.99% ( 3.80%) Hmean 8 1111.38 +- 2.52% ( ) 2159.78 +- 0.38% ( 94.33%) 1359.92 +- 1.56% ( 22.36%) Hmean 16 2286.47 +- 1.36% ( ) 3338.29 +- 0.21% ( 46.00%) 2720.20 +- 0.52% ( 18.97%) Hmean 32 4704.84 +- 0.35% ( ) 4759.03 +- 0.43% ( 1.15%) 4774.48 +- 0.30% ( 1.48%) Hmean 64 7578.04 +- 0.27% ( ) 7533.70 +- 0.43% ( -0.59%) 7462.17 +- 0.65% ( -1.53%) Hmean 128 6998.52 +- 0.16% ( ) 6987.59 +- 0.12% ( -0.16%) 6909.17 +- 0.14% ( -1.28%) Hmean 192 6901.35 +- 0.25% ( ) 6913.16 +- 0.10% ( 0.17%) 6855.47 +- 0.21% ( -0.66%) 5.2.0 3C-turbo 5.2.0 4C-turbo 5.2.0 12C-turbo - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Hmean 1 128.43 +- 0.28% ( 1.34%) 130.64 +- 3.81% ( 3.09%) 153.71 +- 5.89% ( 21.30%) Hmean 2 311.70 +- 6.15% ( 20.79%) 281.66 +- 3.40% ( 9.15%) 305.08 +- 5.70% ( 18.23%) Hmean 4 641.98 +- 2.32% ( 24.83%) 623.88 +- 5.28% ( 21.31%) 906.84 +- 4.65% ( 76.32%) Hmean 8 1633.31 +- 1.56% ( 46.96%) 1714.16 +- 0.93% ( 54.24%) 2095.74 +- 0.47% ( 88.57%) Hmean 16 3047.24 +- 0.42% ( 33.27%) 3155.02 +- 0.30% ( 37.99%) 3634.58 +- 0.15% ( 58.96%) Hmean 32 4734.31 +- 0.60% ( 0.63%) 4804.38 +- 0.23% ( 2.12%) 4674.62 +- 0.27% ( -0.64%) Hmean 64 7699.74 +- 0.35% ( 1.61%) 7499.72 +- 0.34% ( -1.03%) 7659.03 +- 0.25% ( 1.07%) Hmean 128 6935.18 +- 0.15% ( -0.91%) 6942.54 +- 0.10% ( -0.80%) 7004.85 +- 0.12% ( 0.09%) Hmean 192 6901.62 +- 0.12% ( 0.00%) 6856.93 +- 0.10% ( -0.64%) 6978.74 +- 0.10% ( 1.12%) This is one of the cases where the patch still can't surpass active intel_pstate, not even when freq_max is as low as 12C-turbo. Otherwise, gains are visible up to 16 clients and the saturated scenario is the same as baseline. The scores in the summary table from the previous sections are ratios of geometric means of the results over different clients, as seen in this table. Machine : 80x-BROADWELL-NUMA Benchmark : kernbench (kernel compilation) Varying parameter : number of jobs Unit : seconds (lower is better) 5.2.0 vanilla (BASELINE) 5.2.0 intel_pstate 5.2.0 1C-turbo - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 2 379.68 +- 0.06% ( ) 330.20 +- 0.43% ( 13.03%) 285.93 +- 0.07% ( 24.69%) Amean 4 200.15 +- 0.24% ( ) 175.89 +- 0.22% ( 12.12%) 153.78 +- 0.25% ( 23.17%) Amean 8 106.20 +- 0.31% ( ) 95.54 +- 0.23% ( 10.03%) 86.74 +- 0.10% ( 18.32%) Amean 16 56.96 +- 1.31% ( ) 53.25 +- 1.22% ( 6.50%) 48.34 +- 1.73% ( 15.13%) Amean 32 34.80 +- 2.46% ( ) 33.81 +- 0.77% ( 2.83%) 30.28 +- 1.59% ( 12.99%) Amean 64 26.11 +- 1.63% ( ) 25.04 +- 1.07% ( 4.10%) 22.41 +- 2.37% ( 14.16%) Amean 128 24.80 +- 1.36% ( ) 23.57 +- 1.23% ( 4.93%) 21.44 +- 1.37% ( 13.55%) Amean 160 24.85 +- 0.56% ( ) 23.85 +- 1.17% ( 4.06%) 21.25 +- 1.12% ( 14.49%) 5.2.0 3C-turbo 5.2.0 4C-turbo 5.2.0 8C-turbo - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 2 284.08 +- 0.13% ( 25.18%) 283.96 +- 0.51% ( 25.21%) 285.05 +- 0.21% ( 24.92%) Amean 4 153.18 +- 0.22% ( 23.47%) 154.70 +- 1.64% ( 22.71%) 153.64 +- 0.30% ( 23.24%) Amean 8 87.06 +- 0.28% ( 18.02%) 86.77 +- 0.46% ( 18.29%) 86.78 +- 0.22% ( 18.28%) Amean 16 48.03 +- 0.93% ( 15.68%) 47.75 +- 1.99% ( 16.17%) 47.52 +- 1.61% ( 16.57%) Amean 32 30.23 +- 1.20% ( 13.14%) 30.08 +- 1.67% ( 13.57%) 30.07 +- 1.67% ( 13.60%) Amean 64 22.59 +- 2.02% ( 13.50%) 22.63 +- 0.81% ( 13.32%) 22.42 +- 0.76% ( 14.12%) Amean 128 21.37 +- 0.67% ( 13.82%) 21.31 +- 1.15% ( 14.07%) 21.17 +- 1.93% ( 14.63%) Amean 160 21.68 +- 0.57% ( 12.76%) 21.18 +- 1.74% ( 14.77%) 21.22 +- 1.00% ( 14.61%) The patch outperform active intel_pstate (and baseline) by a considerable margin; the summary table from the previous section says 4C turbo and active intel_pstate are 0.83 and 0.93 against baseline respectively, so 4C turbo is 0.83/0.93=0.89 against intel_pstate (~10% better on average). There is no noticeable difference with regard to the value of freq_max. Machine : 8x-SKYLAKE-UMA Benchmark : gitsource (time to run the git unit test suite) Varying parameter : none Unit : seconds (lower is better) 5.2.0 vanilla 5.2.0 intel_pstate/hwp 5.2.0 1C-turbo - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 858.85 +- 1.16% ( ) 791.94 +- 0.21% ( 7.79%) 474.95 ( 44.70%) 5.2.0 3C-turbo 5.2.0 4C-turbo - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 475.26 +- 0.20% ( 44.66%) 474.34 +- 0.13% ( 44.77%) In this test, which is of interest as representing shell-intensive (i.e. fork-intensive) serialized workloads, invariant schedutil outperforms intel_pstate/powersave by a whopping 40% margin. 5.3.4 POWER CONSUMPTION, PERFORMANCE-PER-WATT --------------------------------------------- The following table shows average power consumption in watt for each benchmark. Data comes from turbostat (package average), which in turn is read from the RAPL interface on CPUs. We know the patch affects CPU frequencies so it's reasonable to ignore other power consumers (such as memory or I/O). Also, we don't have a power meter available in the lab so RAPL is the best we have. turbostat sampled average power every 10 seconds for the entire duration of each benchmark. We took all those values and averaged them (i.e. with don't have detail on a per-parameter granularity, only on whole benchmarks). 80x-BROADWELL-NUMA (power consumption, watts) +--------+ BASELINE I_PSTATE 1C 3C | 4C | 8C pgbench-ro 130.01 142.77 131.11 132.45 | 134.65 | 136.84 pgbench-rw 68.30 60.83 71.45 71.70 | 71.65 | 72.54 dbench4 90.25 59.06 101.43 99.89 | 101.10 | 102.94 netperf-udp 65.70 69.81 66.02 68.03 | 68.27 | 68.95 netperf-tcp 88.08 87.96 88.97 88.89 | 88.85 | 88.20 tbench4 142.32 176.73 153.02 163.91 | 165.58 | 176.07 kernbench 92.94 101.95 114.91 115.47 | 115.52 | 115.10 gitsource 40.92 41.87 75.14 75.20 | 75.40 | 75.70 +--------+ 8x-SKYLAKE-UMA (power consumption, watts) +--------+ BASELINE I_PSTATE/HWP 1C 3C | 4C | pgbench-ro 46.49 46.68 46.56 46.59 | 46.52 | pgbench-rw 29.34 31.38 30.98 31.00 | 31.00 | dbench4 27.28 27.37 27.49 27.41 | 27.38 | netperf-udp 22.33 22.41 22.36 22.35 | 22.36 | netperf-tcp 27.29 27.29 27.30 27.31 | 27.33 | tbench4 41.13 45.61 43.10 43.33 | 43.56 | kernbench 42.56 42.63 43.01 43.01 | 43.01 | gitsource 13.32 13.69 17.33 17.30 | 17.35 | +--------+ 48x-HASWELL-NUMA (power consumption, watts) +--------+ BASELINE I_PSTATE 1C 3C | 4C | 12C pgbench-ro 128.84 136.04 129.87 132.43 | 132.30 | 134.86 pgbench-rw 37.68 37.92 37.17 37.74 | 37.73 | 37.31 dbench4 28.56 28.73 28.60 28.73 | 28.70 | 28.79 netperf-udp 56.70 60.44 56.79 57.42 | 57.54 | 57.52 netperf-tcp 75.49 75.27 75.87 76.02 | 76.01 | 75.95 tbench4 115.44 139.51 119.53 123.07 | 123.97 | 130.22 kernbench 83.23 91.55 95.58 95.69 | 95.72 | 96.04 gitsource 36.79 36.99 39.99 40.34 | 40.35 | 40.23 +--------+ A lower power consumption isn't necessarily better, it depends on what is done with that energy. Here are tables with the ratio of performance-per-watt on each machine and benchmark. Higher is always better; a tilde (~) means a neutral ratio (i.e. 1.00). 80x-BROADWELL-NUMA (performance-per-watt ratios; higher is better) +------+ I_PSTATE 1C 3C | 4C | 8C pgbench-ro 1.04 1.06 0.94 | 1.07 | 1.08 pgbench-rw 1.10 0.97 0.96 | 0.96 | 0.97 dbench4 1.24 0.94 0.95 | 0.94 | 0.92 netperf-udp ~ 1.02 1.02 | ~ | 1.02 netperf-tcp ~ 1.02 ~ | ~ | 1.02 tbench4 1.26 1.10 1.06 | 1.12 | 1.26 kernbench 0.98 0.97 0.97 | 0.97 | 0.98 gitsource ~ 1.11 1.11 | 1.11 | 1.13 +------+ 8x-SKYLAKE-UMA (performance-per-watt ratios; higher is better) +------+ I_PSTATE/HWP 1C 3C | 4C | pgbench-ro ~ ~ ~ | ~ | pgbench-rw 0.95 0.97 0.96 | 0.96 | dbench4 ~ ~ ~ | ~ | netperf-udp ~ ~ ~ | ~ | netperf-tcp ~ ~ ~ | ~ | tbench4 1.17 1.09 1.08 | 1.10 | kernbench ~ ~ ~ | ~ | gitsource 1.06 1.40 1.40 | 1.40 | +------+ 48x-HASWELL-NUMA (performance-per-watt ratios; higher is better) +------+ I_PSTATE 1C 3C | 4C | 12C pgbench-ro 1.09 ~ 1.09 | 1.03 | 1.11 pgbench-rw ~ 0.86 ~ | ~ | 0.86 dbench4 ~ 1.02 1.02 | 1.02 | ~ netperf-udp ~ 0.97 1.03 | 1.02 | ~ netperf-tcp 0.96 ~ ~ | ~ | ~ tbench4 1.24 ~ 1.06 | 1.05 | 1.11 kernbench 0.97 0.97 0.98 | 0.97 | 0.96 gitsource 1.03 1.33 1.32 | 1.32 | 1.33 +------+ These results are overall pleasing: in plenty of cases we observe performance-per-watt improvements. The few regressions (read/write pgbench and dbench on the Broadwell machine) are of small magnitude. kernbench loses a few percentage points (it has a 10-15% performance improvement, but apparently the increase in power consumption is larger than that). tbench4 and gitsource, which benefit the most from the patch, keep a positive score in this table which is a welcome surprise; that suggests that in those particular workloads the non-invariant schedutil (and active intel_pstate, too) makes some rather suboptimal frequency selections. +-------------------------------------------------------------------------+ | 6. MICROARCH'ES ADDRESSED HERE +-------------------------------------------------------------------------+ The patch addresses Xeon Core processors that use MSR_PLATFORM_INFO and MSR_TURBO_RATIO_LIMIT to advertise their base frequency and turbo frequencies respectively. This excludes the recent Xeon Scalable Performance processors line (Xeon Gold, Platinum etc) whose MSRs have to be parsed differently. Subsequent patches will address: * Xeon Scalable Performance processors and Atom Goldmont/Goldmont Plus * Xeon Phi (Knights Landing, Knights Mill) * Atom Silvermont +-------------------------------------------------------------------------+ | 7. REFERENCES +-------------------------------------------------------------------------+ Tests have been run with the help of the MMTests performance testing framework, see github.com/gormanm/mmtests. The configuration file names for the benchmark used are: db-pgbench-timed-ro-small-xfs db-pgbench-timed-rw-small-xfs io-dbench4-async-xfs network-netperf-unbound network-tbench scheduler-unbound workload-kerndevel-xfs workload-shellscripts-xfs hpc-nas-c-class-mpi-full-xfs hpc-nas-c-class-omp-full All those benchmarks are generally available on the web: pgbench: https://www.postgresql.org/docs/10/pgbench.html netperf: https://hewlettpackard.github.io/netperf/ dbench/tbench: https://dbench.samba.org/ gitsource: git unit test suite, github.com/git/git NAS Parallel Benchmarks: https://www.nas.nasa.gov/publications/npb.html hackbench: https://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Doug Smythies <dsmythies@telus.net> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200122151617.531-2-ggherdovich@suse.cz

view details

Giovanni Gherdovich

commit sha 2a0abc59699896f03bf6f16efb8a3a490511216f

x86, sched: Add support for frequency invariance on SKYLAKE_X The scheduler needs the ratio freq_curr/freq_max for frequency-invariant accounting. On SKYLAKE_X CPUs set freq_max to the highest frequency that can be sustained by a group of at least 4 cores. From the changelog of commit 31e07522be56 ("tools/power turbostat: fix decoding for GLM, DNV, SKX turbo-ratio limits"): > Newer processors do not hard-code the the number of cpus in each bin > to {1, 2, 3, 4, 5, 6, 7, 8} Rather, they can specify any number > of CPUS in each of the 8 bins: > > eg. > > ... > 37 * 100.0 = 3600.0 MHz max turbo 4 active cores > 38 * 100.0 = 3700.0 MHz max turbo 3 active cores > 39 * 100.0 = 3800.0 MHz max turbo 2 active cores > 39 * 100.0 = 3900.0 MHz max turbo 1 active cores > > could now look something like this: > > ... > 37 * 100.0 = 3600.0 MHz max turbo 16 active cores > 38 * 100.0 = 3700.0 MHz max turbo 8 active cores > 39 * 100.0 = 3800.0 MHz max turbo 4 active cores > 39 * 100.0 = 3900.0 MHz max turbo 2 active cores This encoding of turbo levels applies to both SKYLAKE_X and GOLDMONT/GOLDMONT_D, but we treat these two classes in separate commits because their freq_max values need to be different. For SKX we prefer a lower freq_max in the ratio freq_curr/freq_max, allowing load and utilization to overshoot and the schedutil governor to be more performance-oriented. Models from the Atom series (such as GOLDMONT*) are handled in a forthcoming commit as they have to favor power-efficiency over performance. Results from a performance evaluation follow. 1. TEST SETUP 2. NEUTRAL BENCHMARKS 3. NON-NEUTRAL BENCHMARKS 4. DETAILED TABLES 1. TEST SETUP ------------- Test machine: CPU Model : Intel Xeon Platinum 8260L CPU @ 2.40GHz (a.k.a. Cascade Lake) Fam/Mod/Ste : 6:85:6 Topology : 2 sockets, 24 cores / 48 threads each socket Memory : 192G Storage : SSD, XFS filesystem Max EFFICiency, BASE frequency and available turbo levels (MHz): EFFIC 1000 |********** BASE 2400 |************************ 24C 3100 |******************************* 20C 3300 |********************************* 16C 3600 |************************************ 12C 3600 |************************************ 8C 3600 |************************************ 4C 3700 |************************************* 2C 3900 |*************************************** Tested kernels: Baseline : v5.2, intel_pstate passive, schedutil Comparison #1 : v5.2, intel_pstate active , powersave+HWP Comparison #2 : v5.2, this patch, intel_pstate passive, schedutil 2. NEUTRAL BENCHMARKS --------------------- * pgbench read/write * NASA Parallel Benchmarks (NPB), MPI or OpenMP for message-passing * hackbench * netperf 3. NON-NEUTRAL BENCHMARKS ------------------------- comparison ratio with baseline; 1.00 means neutral, higher is better: I_PSTATE FREQ-INV ---------------------------------------- pgbench read-only 1.10 ~ tbench 1.82 1.14 comparison ratio with baseline; 1.00 means neutral, lower is better: I_PSTATE FREQ-INV ---------------------------------------- dbench ~ 0.97 kernbench 0.88 0.78 gitsource[*] ~ 0.46 [*] "gitsource" consists in running git's unit tests tilde (~) means 1.00, ie result identical to baseline Performance per watt: performance-per-watt ratios with baseline; 1.00 means neutral, higher is better: I_PSTATE FREQ-INV ---------------------------------------- dbench 0.92 0.91 tbench 1.26 1.04 kernbench 0.95 0.96 gitsource 1.03 1.30 Similarly to earlier Xeons, measurable performance gains over non-invariant schedutil are observed on dbench, tbench, kernel compilation and running the git unit tests suite. Looking at the detailed tables show that the patch scores the largest difference when the machine is lightly loaded. Power efficiency suffers lightly on kernbench and a bit more on dbench, but largely improves on gitsource (which also runs considerably faster). For reference, we also report results using active intel_pstate with powersave and HWP; the largest gap between non-invariant schedutil and intel_pstate+powersave is still tbench, which runs 82% better and with 26% improved efficiency on the latter configuration -- this divide isn't closed yet by frequency-invariant schedutil. 4. DETAILED TABLES ------------------ Benchmark : tbench4 (i.e. dbench4 over the network, actually loopback) Varying parameter : number of clients Unit : MB/sec (higher is better) 5.2.0 vanilla (BASELINE) 5.2.0 intel_pstate/HWP 5.2.0 freq-inv - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Hmean 1 183.56 +- 0.21% ( ) 516.12 +- 0.57% ( 181.18%) 185.59 +- 0.59% ( 1.11%) Hmean 2 365.75 +- 0.25% ( ) 1015.14 +- 0.33% ( 177.55%) 402.59 +- 4.48% ( 10.07%) Hmean 4 720.99 +- 0.44% ( ) 1951.75 +- 0.28% ( 170.70%) 738.39 +- 1.72% ( 2.41%) Hmean 8 1449.93 +- 0.34% ( ) 3830.56 +- 0.24% ( 164.19%) 1750.36 +- 4.65% ( 20.72%) Hmean 16 2874.26 +- 0.57% ( ) 7381.62 +- 0.53% ( 156.82%) 4348.35 +- 2.22% ( 51.29%) Hmean 32 6116.17 +- 5.10% ( ) 13013.05 +- 0.08% ( 112.76%) 8980.35 +- 0.66% ( 46.83%) Hmean 64 14485.04 +- 3.46% ( ) 17835.12 +- 0.35% ( 23.13%) 16540.73 +- 0.51% ( 14.19%) Hmean 128 30779.16 +- 3.20% ( ) 32796.94 +- 2.13% ( 6.56%) 31512.58 +- 0.20% ( 2.38%) Hmean 256 34664.66 +- 0.81% ( ) 34604.67 +- 0.46% ( -0.17%) 34943.70 +- 0.25% ( 0.80%) Hmean 384 33957.51 +- 0.11% ( ) 34091.50 +- 0.14% ( 0.39%) 33921.41 +- 0.09% ( -0.11%) Benchmark : kernbench (kernel compilation) Varying parameter : number of jobs Unit : seconds (lower is better) 5.2.0 vanilla (BASELINE) 5.2.0 intel_pstate/HWP 5.2.0 freq-inv - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 2 332.94 +- 0.40% ( ) 260.16 +- 0.45% ( 21.86%) 233.56 +- 0.21% ( 29.85%) Amean 4 173.04 +- 0.43% ( ) 138.76 +- 0.03% ( 19.81%) 123.59 +- 0.11% ( 28.58%) Amean 8 89.65 +- 0.20% ( ) 73.54 +- 0.09% ( 17.97%) 65.69 +- 0.10% ( 26.72%) Amean 16 48.08 +- 1.41% ( ) 41.64 +- 1.61% ( 13.40%) 36.00 +- 1.80% ( 25.11%) Amean 32 28.78 +- 0.72% ( ) 26.61 +- 1.99% ( 7.55%) 23.19 +- 1.68% ( 19.43%) Amean 64 20.46 +- 1.85% ( ) 19.76 +- 0.35% ( 3.42%) 17.38 +- 0.92% ( 15.06%) Amean 128 18.69 +- 1.70% ( ) 17.59 +- 1.04% ( 5.90%) 15.73 +- 1.40% ( 15.85%) Amean 192 18.82 +- 1.01% ( ) 17.76 +- 0.77% ( 5.67%) 15.57 +- 1.80% ( 17.28%) Benchmark : gitsource (time to run the git unit test suite) Varying parameter : none Unit : seconds (lower is better) 5.2.0 vanilla (BASELINE) 5.2.0 intel_pstate/HWP 5.2.0 freq-inv - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Amean 792.49 +- 0.20% ( ) 779.35 +- 0.24% ( 1.66%) 427.14 +- 0.16% ( 46.10%) Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200122151617.531-3-ggherdovich@suse.cz

view details

Giovanni Gherdovich

commit sha 8bea0dfb4a820ae063568a87cc2e7d8f587377af

x86, sched: Add support for frequency invariance on XEON_PHI_KNL/KNM The scheduler needs the ratio freq_curr/freq_max for frequency-invariant accounting. On Xeon Phi CPUs set freq_max to the second-highest frequency reported by the CPU. Xeon Phi CPUs such as Knights Landing and Knights Mill typically have either one or two turbo frequencies; in the former case that's 100 MHz above the base frequency, in the latter case the two levels are 100 MHz and 200 MHz above base frequency. We set freq_max to the second-highest frequency reported by the CPU. This could be the base frequency (if only one turbo level is available) or the first turbo level (if two levels are available). The rationale is to compromise between power efficiency or performance -- going straight to max turbo would favor efficiency and blindly using base freq would favor performance. For reference, this is how MSR_TURBO_RATIO_LIMIT must be parsed on a Xeon Phi to get the available frequencies (taken from a comment in turbostat's sources): [0] -- Reserved [7:1] -- Base value of number of active cores of bucket 1. [15:8] -- Base value of freq ratio of bucket 1. [20:16] -- +ve delta of number of active cores of bucket 2. i.e. active cores of bucket 2 = active cores of bucket 1 + delta [23:21] -- Negative delta of freq ratio of bucket 2. i.e. freq ratio of bucket 2 = freq ratio of bucket 1 - delta [28:24]-- +ve delta of number of active cores of bucket 3. [31:29]-- -ve delta of freq ratio of bucket 3. [36:32]-- +ve delta of number of active cores of bucket 4. [39:37]-- -ve delta of freq ratio of bucket 4. [44:40]-- +ve delta of number of active cores of bucket 5. [47:45]-- -ve delta of freq ratio of bucket 5. [52:48]-- +ve delta of number of active cores of bucket 6. [55:53]-- -ve delta of freq ratio of bucket 6. [60:56]-- +ve delta of number of active cores of bucket 7. [63:61]-- -ve delta of freq ratio of bucket 7. 1. PERFORMANCE EVALUATION: TBENCH +5% 2. NEUTRAL BENCHMARKS (ALL OTHERS) 3. TEST SETUP 1. PERFORMANCE EVALUATION: TBENCH +5% ------------------------------------- A performance evaluation was conducted on a Knights Mill machine (see "Test Setup" below), were the frequency-invariance patch (on schedutil) is compared to both non-invariant schedutil and active intel_pstate with powersave: all three tested kernels behave the same performance-wise and with regard to power consumption (performance per watt). The only notable difference is tbench: comparison ratio of performance with baseline; 1.00 means neutral, higher is better: I_PSTATE FREQ-INV ---------------------------------------- tbench 1.04 1.05 performance-per-watt ratios with baseline; 1.00 means neutral, higher is better: I_PSTATE FREQ-INV ---------------------------------------- tbench 1.03 1.04 which essentially means that frequency-invariant schedutil is 5% better than baseline, the same as intel_pstate+powersave. As the results above are averaged over the varying parameter, here the detailed table. Varying parameter : number of clients Unit : MB/sec (higher is better) 5.2.0 vanilla (BASELINE) 5.2.0 intel_pstate 5.2.0 freq-inv - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Hmean 1 49.06 +- 2.12% ( ) 51.66 +- 1.52% ( 5.30%) 52.87 +- 0.88% ( 7.76%) Hmean 2 93.82 +- 0.45% ( ) 103.24 +- 0.70% ( 10.05%) 105.90 +- 0.70% ( 12.88%) Hmean 4 192.46 +- 1.15% ( ) 215.95 +- 0.60% ( 12.21%) 215.78 +- 1.43% ( 12.12%) Hmean 8 406.74 +- 2.58% ( ) 438.58 +- 0.36% ( 7.83%) 437.61 +- 0.97% ( 7.59%) Hmean 16 857.70 +- 1.22% ( ) 890.26 +- 0.72% ( 3.80%) 889.11 +- 0.73% ( 3.66%) Hmean 32 1760.10 +- 0.92% ( ) 1791.70 +- 0.44% ( 1.79%) 1787.95 +- 0.44% ( 1.58%) Hmean 64 3183.50 +- 0.34% ( ) 3183.19 +- 0.36% ( -0.01%) 3187.53 +- 0.36% ( 0.13%) Hmean 128 4830.96 +- 0.31% ( ) 4846.53 +- 0.30% ( 0.32%) 4855.86 +- 0.30% ( 0.52%) Hmean 256 5467.98 +- 0.38% ( ) 5793.80 +- 0.28% ( 5.96%) 5821.94 +- 0.17% ( 6.47%) Hmean 512 5398.10 +- 0.06% ( ) 5745.56 +- 0.08% ( 6.44%) 5503.68 +- 0.07% ( 1.96%) Hmean 1024 5290.43 +- 0.63% ( ) 5221.07 +- 0.47% ( -1.31%) 5277.22 +- 0.80% ( -0.25%) Hmean 1088 5139.71 +- 0.57% ( ) 5236.02 +- 0.71% ( 1.87%) 5190.57 +- 0.41% ( 0.99%) 2. NEUTRAL BENCHMARKS (ALL OTHERS) ---------------------------------- * pgbench (both read/write and read-only) * NASA Parallel Benchmarks (NPB), MPI or OpenMP for message-passing * hackbench * netperf * dbench * kernbench * gitsource (git unit test suite) 3. TEST SETUP ------------- Test machine: CPU Model : Intel Xeon Phi CPU 7255 @ 1.10GHz (a.k.a. Knights Mill) Fam/Mod/Ste : 6:133:0 Topology : 1 socket, 68 cores / 272 threads Memory : 96G Storage : rotary, XFS filesystem Max EFFICiency, BASE frequency and available turbo levels (MHz): EFFIC 1000 |********** BASE 1100 |*********** 68C 1100 |*********** 30C 1200 |************ Tested kernels: Baseline : v5.2, intel_pstate passive, schedutil Comparison #1 : v5.2, intel_pstate active , powersave Comparison #2 : v5.2, this patch, intel_pstate passive, schedutil Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200122151617.531-4-ggherdovich@suse.cz

view details

Giovanni Gherdovich

commit sha eacf0474aec8bdccdc7f19386319127c67be3588

x86, sched: Add support for frequency invariance on ATOM_GOLDMONT* The scheduler needs the ratio freq_curr/freq_max for frequency-invariant accounting. On GOLDMONT (aka Apollo Lake), GOLDMONT_D (aka Denverton) and GOLDMONT_PLUS CPUs (aka Gemini Lake) set freq_max to the highest frequency reported by the CPU. The encoding of turbo ratios for GOLDMONT* is identical to the one for SKYLAKE_X, but we treat the Atom case apart because we want to set freq_max to a higher value, thus the ratio freq_curr/freq_max to be lower, leading to more conservative frequency selections (favoring power efficiency). Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200122151617.531-5-ggherdovich@suse.cz

view details

Giovanni Gherdovich

commit sha 298c6f99bf30ef735e79f7f6d086bdfae505d380

x86, sched: Add support for frequency invariance on ATOM The scheduler needs the ratio freq_curr/freq_max for frequency-invariant accounting. On all ATOM CPUs prior to Goldmont, set freq_max to the 1-core turbo ratio. We intended to perform tests validating that this patch doesn't regress in terms of energy efficiency, given that this is the primary concern on Atom processors. Alas, we found out that turbostat doesn't support reading RAPL interfaces on our test machine (Airmont), and we don't have external equipment to measure power consumption; all we have is the performance results of the benchmarks we ran. Test machine: Platform : Dell Wyse 3040 Thin Client[1] CPU Model : Intel Atom x5-Z8350 (aka Cherry Trail, aka Airmont) Fam/Mod/Ste : 6:76:4 Topology : 1 socket, 4 cores / 4 threads Memory : 2G Storage : onboard flash, XFS filesystem [1] https://www.dell.com/en-us/work/shop/wyse-endpoints-and-software/wyse-3040-thin-client/spd/wyse-3040-thin-client Base frequency and available turbo levels (MHz): Min Operating Freq 266 |*** Low Freq Mode 800 |******** Base Freq 2400 |************************ 4 Cores 2800 |**************************** 3 Cores 2800 |**************************** 2 Cores 3200 |******************************** 1 Core 3200 |******************************** Tested kernels: Baseline : v5.4-rc1, intel_pstate passive, schedutil Comparison #1 : v5.4-rc1, intel_pstate active , powersave Comparison #2 : v5.4-rc1, this patch, intel_pstate passive, schedutil tbench, hackbench and kernbench performed the same under all three kernels; dbench ran faster with intel_pstate/powersave and the git unit tests were a lot faster with intel_pstate/powersave and invariant schedutil wrt the baseline. Not that any of this is terrbily interesting anyway, one doesn't buy an Atom system to go fast. Power consumption regressions aren't expected but we lack the equipment to make that measurement. Turbostat seems to think that reading RAPL on this machine isn't a good idea and we're trusting that decision. comparison ratio of performance with baseline; 1.00 means neutral, lower is better: I_PSTATE