Custom NVMe Debug Driver
- Custom NVMe Debug Driver
- Building the Driver
- Using the Driver
- Original core.c
- Driver Modifications
- Circular Buffer Definition
- Macro Definition
- Structure Definition
- Global Variables
- Function Implementation
- Customize nvme_error_status
- Add a Custom Error Handling Function
- Modify the existing nvme_error_status
- Register Circular Buffer with /proc
- Copy Circular Buffer Data to /proc
- Register with /proc
- Module Cleanup
- Background Notes
Building the Driver
dnf update -y
STOP If you have any updates at all. Reboot. Failing to do this is likely to lead to your kernel parameters not matching in the build process. Rather than trying to guess at which updates could cause updates to the kernel symbols it's easier to reboot. If you want to avoid rebooting look for any updates to kernel source or the kernel itself.
dnf install -y kernel-headers ncurses-devel tmux
dnf download --source kernel
rpm2cpio kernel-5.14.0-362.18.1.el9_3.src.rpm | cpio -idmv
tar -xf linux-5.14.0-362.18.1.el9_3.tar.xz
In the commands below I pull Module.symvers
file from the kernel headers directory and .config
from the boot directory so we are guaranteed to have the correct kernel parameters / symbols.
# Installs necessary packages for building and compiling the kernel and kernel modules
dnf install -y rpm-build rpmdevtools git python3-devel make gcc flex bison kernel-headers ncurses-devel tmux elfutils-libelf-devel openssl-devel bc kernel-devel-$(uname -r) dwarves
# Sets up the RPM build environment directories under ~/rpmbuild
rpmdev-setuptree
# Downloads the source RPM package for the kernel
dnf download --source kernel
# Installs the source RPM package, which comes with the exact kernel build used to build this version of Rocky
rpm -ivh kernel-5.14.0-362.18.1.el9_3.src.rpm
# SPECS in the context of RPM has everything needed to build to include the source code
# and OS-specific kernel patches
cd /root/rpmbuild/SPECS
# Executes the pre-build steps of the kernel build process, preparing the environment for the actual build
# to include applying Rocky-specific patches
rpmbuild -bp kernel.spec
cd /root/rpmbuild/BUILD/kernel-5.14.0-362.18.1.el9_3/linux-5.14.0-362.18.1.el9.x86_64/
# Copies the current kernel's configuration file to the build directory as the base
# configuration for the new kernel. This includes all the compile options used for the original kernel build
cp -f /boot/config-$(uname -r) .config
# Copies the Module symbol versioning information from the current kernel for compatibility.
# This is how we make sure symbols match for any exported functions
cp -f /usr/src/kernels/$(uname -r)/Module.symvers .
Before continuing you need to edit drivers/nvme/host/core.c
. Replace it with the contents of the modified core.c
file. If you are running on a 6.X kernel use this version instead. After the modifications are complete run:
make -j$(nproc --all) scripts prepare modules_prepare
make ARCH=x86_64 -j$(nproc --all) M=drivers/nvme
# Common must come first because it has several functions it exports to core and if you
# try to load core first you will get symbol errors
insmod drivers/nvme/common/nvme-common.ko
insmod drivers/nvme/host/nvme-core.ko
# nvme.ko is what makes the drives actually show up on Linux
insmod drivers/nvme/host/nvme.ko
If you want the minor version to match on the driver add -362.18.1.el9_3.x86_64
to CONFIG_LOCALVERSION.
Command Description
make clean
: This command cleans out the build directory. It removes files generated by the previous build, ensuring a fresh start. This step is crucial to prevent any potential interference from previous build attempts, which could include improperly configured build artifacts or leftover files from a different kernel version.make -j$(nproc --all) modules_prepare
: This prepares the kernel build system to build external modules. It generates all the dependency files and performs any necessary setup without actually compiling the entire kernel or any modules. This step is necessary because it ensures that all the scripts and tools required for building modules are correctly prepared and that the kernel build environment is consistent.make -j$(nproc --all) oldconfig && make -j$(nproc --all) prepare
: These commands process the configuration file to ensure it is up-to-date (oldconfig
) and then prepare the kernel build environment based on this configuration (prepare
). This includes setting up various autogenerated header files and making sure the build system's state is consistent with the configuration you're trying to build against.make ARCH=x86_64 -j$(nproc --all) M=drivers/nvme
: Finally, this command compiles the NVMe module with the correct architecture specified, ensuring that thevermagic
string will match the running kernel, and using multiple cores (-j$(nproc --all)
) to speed up the compilation.
See Background Notes for more information.
Using the Driver
Once the driver is compiled and loaded you should see the following in dmesg
[ 719.193448] Custom debug driver finished loading.
[ 719.197730] nvme nvme2: pci function 0000:43:00.0
[ 719.198065] nvme nvme3: pci function 0000:44:00.0
[ 719.198370] nvme nvme1: pci function 0000:42:00.0
[ 719.198573] nvme nvme4: pci function 0000:c6:00.0
[ 719.200146] nvme nvme0: pci function 0000:41:00.0
[ 719.211623] nvme nvme1: Shutdown timeout set to 15 seconds
[ 719.216810] nvme nvme0: Shutdown timeout set to 15 seconds
[ 719.221537] nvme nvme3: Shutdown timeout set to 10 seconds
[ 719.221548] nvme nvme2: Shutdown timeout set to 10 seconds
[ 719.243116] nvme nvme1: 128/0/0 default/read/poll queues
[ 719.247593] nvme nvme0: 128/0/0 default/read/poll queues
[ 719.259487] nvme nvme3: 128/0/0 default/read/poll queues
[ 719.265318] nvme nvme2: 128/0/0 default/read/poll queues
[ 719.292959] nvme1n1: p1 p2
[ 719.293468] nvme0n1: p1 p2
[ 719.296170] nvme3n1: p1 p2
[ 719.296982] nvme2n1: p1 p5 p6 p7 p8
[ 720.830122] nvme nvme4: Shutdown timeout set to 10 seconds
[ 720.851600] nvme nvme4: 128/0/0 default/read/poll queues
You can check that it is working by looking for its entry on the proc file system:
cat /proc/nvme_circular_buffer
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
[4295386259] Circular buffer initialized and working. You will see this message multiple times. I just put it here because otherwise you just see "[0]" which may be confusing.
I used the above message because otherwise it just says [0]
which looks like we're logging successes when in fact the buffer is just otherwise initialized to zero.
When there is an error it logs to both the /proc and to dmesg:
NOTE: Just to fully clarify, it will not do this by deafult. I just did this for testing. I left the code in place so you can uncomment it.
[root@nvme linux-5.14.0-362.18.1.el9.x86_64]# dmesg
[ 966.884545] log_specific_nvme_error: logging error status 0x0
[ 966.884581] log_specific_nvme_error: logging error status 0x0
[ 966.884653] log_specific_nvme_error: logging error status 0x0
[ 966.884689] log_specific_nvme_error: logging error status 0x0
[ 966.884760] log_specific_nvme_error: logging error status 0x0
[ 966.884797] log_specific_nvme_error: logging error status 0x0
[ 966.884869] log_specific_nvme_error: logging error status 0x0
[ 966.884905] log_specific_nvme_error: logging error status 0x0
[ 966.884977] log_specific_nvme_error: logging error status 0x0
[ 966.885012] log_specific_nvme_error: logging error status 0x0
[ 966.885086] log_specific_nvme_error: logging error status 0x0
[ 966.885122] log_specific_nvme_error: logging error status 0x0
[ 966.885195] log_specific_nvme_error: logging error status 0x0
[ 966.885231] log_specific_nvme_error: logging error status 0x0
[ 966.885303] log_specific_nvme_error: logging error status 0x0
[ 966.885339] log_specific_nvme_error: logging error status 0x0
[ 966.885411] log_specific_nvme_error: logging error status 0x0
[ 966.885446] log_specific_nvme_error: logging error status 0x0
[root@nvme linux-5.14.0-362.18.1.el9.x86_64]# cat /proc/nvme_circular_buffer
[4295632365] We have success! Our driver works!
[4295632365] We have success! Our driver works!
[4295632365] We have success! Our driver works!
[4295632365] We have success! Our driver works!
[4295632365] We have success! Our driver works!
[4295632365] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
[4295632366] We have success! Our driver works!
I didn't have a system with issues to test against so I just forced it to log successes to prove out the concept. The logic is in the function log_specific_nvme_error
and nvme_error_status
. You can uncomment/comment the indicated lines to have the driver log successes just to prove that it is working. Otherwise, it will only log errors.
Original core.c
I downloaded a copy of the original core.c
file here.
Driver Modifications
Circular Buffer Definition
#define DEBUG_BUF_SIZE 1024 // Adjust based on needed capacity
struct debug_entry {
char info[256]; // Adjust based on what you need to store, e.g., error messages
unsigned long jiffies; // Timestamp
// Extend with more fields if needed
};
static struct debug_entry debug_buf[DEBUG_BUF_SIZE];
static int debug_buf_pos = 0;
// Adds an entry to the circular debug buffer
void add_debug_entry(const char *info) {
unsigned long flags;
// Ensure thread safety (if necessary, depending on usage context)
local_irq_save(flags);
strncpy(debug_buf[debug_buf_pos].info, info, sizeof(debug_buf[debug_buf_pos].info) - 1);
debug_buf[debug_buf_pos].jiffies = jiffies;
// Initialize additional fields here if added
debug_buf_pos = (debug_buf_pos + 1) % DEBUG_BUF_SIZE;
local_irq_restore(flags);
}
The provided code snippet implements a circular buffer.
Macro Definition
#define DEBUG_BUF_SIZE 1024
- This line defines a preprocessor macro
DEBUG_BUF_SIZE
with a value of 1024. It specifies the size of the circular debug buffer, indicating it can hold 1024 entries. This value is chosen as a default capacity and can be adjusted based on the specific needs of the debugging scenario, such as the expected volume of debug messages or available memory.
Structure Definition
struct debug_entry {
char info[256];
unsigned long jiffies;
};
- Defines a
struct
nameddebug_entry
, which is a custom data type for storing individual debug messages. char info[256];
declares an array of 256 characters namedinfo
. This array is intended to store a debug message as a null-terminated string. The size, 256, is chosen to accommodate reasonably long messages but can be adjusted as needed.unsigned long jiffies;
declares a variable namedjiffies
of typeunsigned long
. This variable stores the time, in jiffies, when the debug entry was added. Jiffies are a unit of time measurement used in the Linux kernel, representing the number of timer interrupts that have occurred since the system was booted. This provides a timestamp for each debug message, useful for understanding the timing of events.
Global Variables
static struct debug_entry debug_buf[DEBUG_BUF_SIZE];
static int debug_buf_pos = 0;
static struct debug_entry debug_buf[DEBUG_BUF_SIZE];
declares an array ofdebug_entry
structures, nameddebug_buf
, with a size defined byDEBUG_BUF_SIZE
(1024). Being static, this array is only accessible within the file it is declared in, encapsulating the debug buffer from external access or modification.static int debug_buf_pos = 0;
declares a static integerdebug_buf_pos
and initializes it to 0. This variable tracks the current position in the circular buffer where the next debug entry will be written. Being static, it retains its value across function calls and is private to the file.
Function Implementation
void add_debug_entry(const char *info) {
unsigned long flags;
// Ensure thread safety (if necessary, depending on usage context)
local_irq_save(flags);
strncpy(debug_buf[debug_buf_pos].info, info, sizeof(debug_buf[debug_buf_pos].info) - 1);
debug_buf[debug_buf_pos].info[sizeof(debug_buf[debug_buf_pos].info) - 1] = '\0'; // Ensure null-termination
debug_buf[debug_buf_pos].jiffies = jiffies;
debug_buf_pos = (debug_buf_pos + 1) % DEBUG_BUF_SIZE;
local_irq_restore(flags);
}
void add_debug_entry(const char *info)
defines a function that takes a single parameter,info
, which is a pointer to a constant character array (a C string). This function does not return a value (void
).unsigned long flags;
declares a variableflags
to store the current processor flags.local_irq_save(flags);
is a macro that disables local interrupt processing to prevent race conditions, saving the current state of interrupts inflags
. This is crucial for ensuring thread safety, particularly in preemptible or SMP (Symmetric Multiprocessing) kernel configurations, where data structures might be accessed from multiple contexts concurrently.strncpy(debug_buf[debug_buf_pos].info, info, sizeof(debug_buf[debug_buf_pos].info) - 1);
copies the string pointed to byinfo
into theinfo
array of the currentdebug_entry
in the buffer. It copies up tosizeof(debug_buf[debug_buf_pos].info) - 1
characters to ensure there is space for a null terminator. This prevents buffer overflows.debug_buf[debug_buf_pos].info[sizeof(debug_buf[debug_buf_pos].info) - 1] = '\0';
explicitly sets the last character of theinfo
array to '\0' (null terminator), ensuring the string is properly terminated even if the original message exceeds the array size.debug_buf[debug_buf_pos].jiffies = jiffies;
assigns the current value ofjiffies
to thejiffies
field of the currentdebug_entry
, recording the timestamp of the debug event.debug_buf_pos = (debug_buf_pos + 1) % DEBUG_BUF_SIZE;
incrementsdebug_buf_pos
to point to the next position in the circular buffer, wrapping around to 0 if it reaches the end (DEBUG_BUF_SIZE
). This line ensures that the buffer operates in a circular manner, reusing space once the
end is reached.
- local_irq_restore(flags);
re-enables local interrupts, restoring the previous interrupt state stored in flags
. This concludes the critical section of the code, where thread safety is ensured by temporarily disabling interrupts.
Customize nvme_error_status
Add a Custom Error Handling Function
void log_specific_nvme_error(u16 status) {
char error_msg[256]; // Adjust size as necessary
// Construct an appropriate message based on the NVMe status code
snprintf(error_msg, sizeof(error_msg), "Specific NVMe error encountered: Status = 0x%x", status);
// Log this occurrence to the circular DRAM buffer
add_debug_entry(error_msg);
}
This function is designed to log specific NVMe errors using a human-readable message format.
- Function Definition and Purpose:
void log_specific_nvme_error(u16 status)
declares a function that does not return a value (void
) and takes a single argument,status
, which is an unsigned 16-bit integer (u16
). This argument is intended to represent a specific NVMe status code that the function will log. - Local Variable Declaration:
char error_msg[256];
defines a character array namederror_msg
with a size of 256 bytes. This array is used to store the formatted error message that will be generated based on the NVMe status code provided. The size (256) is chosen to ensure enough space for the message and formatting characters. - Message Construction:
snprintf(error_msg, sizeof(error_msg), "Specific NVMe error encountered: Status = 0x%x", status);
uses thesnprintf
function to write a formatted string intoerror_msg
. The format string"Specific NVMe error encountered: Status = 0x%x"
includes a placeholder%x
for a hexadecimal number, whichsnprintf
replaces with the value ofstatus
. This constructs a readable message indicating the NVMe error encountered. The use ofsizeof(error_msg)
ensures that the function does not write more characters toerror_msg
than it can hold, preventing buffer overflow. - Logging the Message:
add_debug_entry(error_msg);
calls theadd_debug_entry
function, passing it theerror_msg
string. This function is expected to add the error message to a circular debug buffer for later retrieval or analysis. This mechanism allows for efficient in-memory logging of error conditions, suitable for environments where minimizing disk I/O is critical.
Modify the existing nvme_error_status
static blk_status_t nvme_error_status(u16 status) {
blk_status_t blk_status;
// printk(KERN_INFO "nvme_error_status: error status 0x%x\n", status);
switch (status & 0x7ff) {
case NVME_SC_SUCCESS:
// Log success with a custom message and return OK
// UNCOMMENT THE BELOW TO LOG SUCCESSES TO TEST THE DRIVER
// log_specific_nvme_error(status);
return BLK_STS_OK;
case NVME_SC_CAP_EXCEEDED:
return BLK_STS_NOSPC;
case NVME_SC_LBA_RANGE:
case NVME_SC_CMD_INTERRUPTED:
case NVME_SC_NS_NOT_READY:
return BLK_STS_TARGET;
case NVME_SC_BAD_ATTRIBUTES:
case NVME_SC_ONCS_NOT_SUPPORTED:
case NVME_SC_INVALID_OPCODE:
case NVME_SC_INVALID_FIELD:
case NVME_SC_INVALID_NS:
return BLK_STS_NOTSUPP;
// Special handling for specific NVMe errors
case NVME_SC_WRITE_FAULT:
case NVME_SC_READ_ERROR:
case NVME_SC_UNWRITTEN_BLOCK:
case NVME_SC_ACCESS_DENIED:
case NVME_SC_READ_ONLY:
case NVME_SC_COMPARE_FAILED:
blk_status = BLK_STS_MEDIUM;
// Call custom function to log specific NVMe errors
log_specific_nvme_error(status);
return blk_status;
case NVME_SC_GUARD_CHECK:
case NVME_SC_APPTAG_CHECK:
case NVME_SC_REFTAG_CHECK:
case NVME_SC_INVALID_PI:
return BLK_STS_PROTECTION;
case NVME_SC_RESERVATION_CONFLICT:
return BLK_STS_NEXUS;
case NVME_SC_HOST_PATH_ERROR:
return BLK_STS_TRANSPORT;
case NVME_SC_ZONE_TOO_MANY_ACTIVE:
return BLK_STS_ZONE_ACTIVE_RESOURCE;
case NVME_SC_ZONE_TOO_MANY_OPEN:
return BLK_STS_ZONE_OPEN_RESOURCE;
default:
return BLK_STS_IOERR;
}
}
The nvme_error_status
function translates NVMe status codes into block layer status codes (blk_status_t
). It has been modified to include special handling for certain NVMe errors:
- Function Definition:
static blk_status_t nvme_error_status(u16 status)
defines a static function that returns ablk_status_t
value and takes a singleu16
argument namedstatus
. Being static, this function is only accessible within the file it is defined in. - Local Variable:
blk_status_t blk_status;
declares a variableblk_status
of typeblk_status_t
, which is used to store the block layer status code that corresponds to the NVMe status code. - Switch Statement: The function uses a
switch
statement to handle different values ofstatus & 0x7ff
. The bitwise AND with0x7ff
masks the upper bits, focusing on the relevant bits for status code determination. Each case in the switch statement corresponds to a group of NVMe status codes, mapping them to appropriate block layer status codes. - Special Error Handling: For specific NVMe errors (
NVME_SC_WRITE_FAULT
,NVME_SC_READ_ERROR
, etc.), the function setsblk_status
toBLK_STS_MEDIUM
, indicating a medium error in the block layer. It then callslog_specific_nvme_error(status)
to log these errors with additional detail before returningblk_status
. - Return Values: Each case in the switch statement returns a
blk_status_t
value that corresponds to the NVMe status code. This value is used by the block layer to understand the outcome of NVMe operations. - Default Case: The
default
case handles any NVMe status codes not explicitly mentioned in the switch statement, returningBLK_STS_IOERR
to indicate a generic I/O error.
Register Circular Buffer with /proc
Copy Circular Buffer Data to /proc
#include <linux/uaccess.h> // For copy_to_user()
#include <linux/proc_fs.h> // For proc file operations
#include <linux/seq_file.h> // For sequential reads
ssize_t circular_buffer_read_proc(struct file *filp, char __user *buffer, size_t length, loff_t *offset) {
static int finished = 0; // Static variable to keep track if we've finished reading the buffer
int i;
ssize_t ret = 0;
if (finished) { // Check if we have finished reading the circular buffer
// printk(KERN_INFO "circular_buffer_read_proc: finished reading\n");
finished = 0; // Reset for the next call
return 0;
}
if (*offset >= DEBUG_BUF_SIZE) // Check if the offset is beyond our buffer size
return 0;
// Iterate over the circular buffer from the current offset
for (i = *offset; i < DEBUG_BUF_SIZE; i++) {
int bytes_not_copied;
char entry_info[512]; // Buffer to hold the formatted entry for copying
int entry_len;
// Format the debug entry into entry_info, including the jiffies timestamp
entry_len = snprintf(entry_info, sizeof(entry_info), "[%lu] %s\n", debug_buf[i].jiffies, debug_buf[i].info);
// Ensure we don't copy more than the user buffer can hold
if (length < entry_len) break;
// printk(KERN_INFO "circular_buffer_read_proc: copying entry [%lu] %s\n", debug_buf[i].jiffies, debug_buf[i].info);
// Copy the formatted string to user space
bytes_not_copied = copy_to_user(buffer + ret, entry_info, entry_len);
if (bytes_not_copied == 0) {
ret += entry_len; // Increment return value by the number of bytes successfully copied
*offset = i + 1; // Update offset for partial reads
} else {
break; // Break the loop if we couldn't copy data to user space
}
if (ret >= length) break; // Check if we've filled the user buffer
}
finished = 1; // Indicate we've finished reading the circular buffer
return ret; // Return the number of bytes copied
}
static const struct file_operations proc_file_fops = {
.owner = THIS_MODULE,
.read = circular_buffer_read_proc,
};
Register with /proc
Note: This is the entire NVMe init function. Our modifications are toward the end.
#include <linux/init.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/workqueue.h>
#include <linux/blkdev.h>
#include <linux/nvme_ioctl.h>
#include <linux/fs.h>
static struct proc_dir_entry *proc_file_entry;
static int __init nvme_core_init(void)
{
int result = -ENOMEM;
nvme_wq = alloc_workqueue("nvme-wq",
WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
if (!nvme_wq)
goto out;
nvme_reset_wq = alloc_workqueue("nvme-reset-wq",
WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
if (!nvme_reset_wq)
goto destroy_wq;
nvme_delete_wq = alloc_workqueue("nvme-delete-wq",
WQ_UNBOUND | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
if (!nvme_delete_wq)
goto destroy_reset_wq;
result = alloc_chrdev_region(&nvme_ctrl_base_chr_devt, 0,
NVME_MINORS, "nvme");
if (result < 0)
goto destroy_delete_wq;
nvme_class = class_create(THIS_MODULE, "nvme");
if (IS_ERR(nvme_class)) {
result = PTR_ERR(nvme_class);
goto unregister_chrdev;
}
nvme_subsys_class = class_create(THIS_MODULE, "nvme-subsystem");
if (IS_ERR(nvme_subsys_class)) {
result = PTR_ERR(nvme_subsys_class);
goto destroy_class;
}
result = alloc_chrdev_region(&nvme_ns_chr_devt, 0, NVME_MINORS, "nvme-generic");
if (result < 0)
goto destroy_subsys_class;
nvme_ns_chr_class = class_create(THIS_MODULE, "nvme-generic");
if (IS_ERR(nvme_ns_chr_class)) {
result = PTR_ERR(nvme_ns_chr_class);
goto unregister_generic_ns;
}
// Here, the proc file for nvme_circular_buffer is created
proc_file_entry = proc_create("nvme_circular_buffer", 0444, NULL, &proc_fops);
if (!proc_file_entry) {
printk(KERN_ERR "Could not create /proc/nvme_circular_buffer\n");
result = -ENOMEM;
goto destroy_ns_chr_class;
}
return 0; // Initialization success
destroy_ns_chr_class:
class_destroy(nvme_ns_chr_class);
unregister_generic_ns:
unregister_chrdev_region(nvme_ns_chr_devt, NVME_MINORS);
destroy_subsys_class:
class_destroy(nvme_subsys_class);
destroy_class:
class_destroy(nvme_class);
unregister_chrdev:
unregister_chrdev_region(nvme_ctrl_base_chr_devt, NVME_MINORS);
destroy_delete_wq:
destroy_workqueue(nvme_delete_wq);
destroy_reset_wq:
destroy_workqueue(nvme_reset_wq);
destroy_wq:
destroy_workqueue(nvme_wq);
out:
return result;
}
static const struct file_operations proc_fops = {
.owner = THIS_MODULE,
.read = circular_buffer_read_proc,
};
Module Cleanup
void cleanup_module(void) { // Module cleanup function
proc_remove(proc_file_entry);
}
Background Notes
make modules_prepare
The modules_prepare
target within the Linux kernel's Makefile plays a crucial role in the kernel development and compilation process, especially when dealing with external modules. Let's break down its function and significance in extreme technical detail:
- Linux Kernel Build System: The Linux kernel uses a complex build system based on Makefiles. This system manages the configuration, compilation, and installation of the kernel and its modules. The kernel itself can be built entirely, or specific parts of it can be compiled as needed.
- Modules: Kernel modules are pieces of code that can be loaded into the kernel at runtime, allowing for the kernel's functionality to be extended without the need to reboot or recompile the entire kernel. Modules are used for things like device drivers, filesystem drivers, and system utilities.
The modules_prepare
Target
-
Purpose: The
modules_prepare
target prepares the kernel source tree for the compilation of external modules. It ensures that all necessary infrastructure, scripts, and configuration files are in place and ready for module compilation. This is particularly useful for developers who need to build modules against the kernel source without compiling the full kernel. -
Dependency Files: One of the key tasks performed by
modules_prepare
is the generation of dependency files. These files (.dep files) track the dependencies between source files and header files. They ensure that changes in header files will trigger the recompilation of dependent source files in subsequent builds. This is critical for keeping the build up-to-date with the latest source changes. -
Configuration and Setup:
- Kbuild System: The kernel uses the Kbuild system, a part of the larger Makefile structure, to manage the compilation of the kernel and modules.
modules_prepare
ensures that the Kbuild system has all the necessary information about the kernel configuration (.config
file) and the target architecture. - Symbol Versioning: The target also prepares symbol versioning files (Modules.symvers and Module.symvers) if they don't exist. These files are crucial for ensuring that the correct versions of symbols are used when linking modules against the kernel. Symbol versioning helps prevent mismatches between module and kernel symbol versions, which could lead to runtime errors or instability.
- Header Files: It ensures that all public kernel header files are prepared and available in a standard location. This may involve cleaning, processing, and copying header files to the build directory, making them accessible for module compilation.
-
Scripts and Tools: The Linux kernel build system uses various scripts and tools (like
modpost
for module post-processing,mk_elfconfig
for generating ELF config files, etc.).modules_prepare
ensures that these tools are correctly generated and ready to use. This includes compiling any host tools that are necessary for the build process. -
Consistency: By running
modules_prepare
, developers can be confident that the kernel build environment is consistent and up-to-date. This consistency is vital for ensuring that external modules will be correctly compiled and linked against the kernel, avoiding potential incompatibilities or build failures.
In summary, the modules_prepare
target is an essential component of the Linux kernel's build system, designed to streamline the development and compilation of external kernel modules. By handling dependency generation, ensuring the availability of scripts and tools, and maintaining the build environment's consistency, modules_prepare
facilitates a more efficient and reliable module development workflow.
.dep files
Dependency files, often with the .dep
extension in many build systems (though in the context of Linux kernel builds, the extension might not always be .dep
but the concept remains), are crucial for the build process, especially in complex projects like the Linux kernel. They play a significant role in managing the relationships between source files and the header files they include, ensuring that changes in dependencies trigger the necessary recompilations. Let's delve into the details:
Structure and Content
A dependency file for a source file essentially lists that source file's direct dependencies—primarily, the header files it includes, both directly and indirectly. Here's a simplified example of what the contents might look like for a hypothetical example.c
source file:
example.o: example.c header1.h header2.h subdir/header3.h
This line signifies that the object file example.o
(compiled from example.c
) depends on example.c
, header1.h
, header2.h
, and subdir/header3.h
. If any of these files change, make
knows to recompile example.c
to produce an updated example.o
.
Generation
Dependency files are usually generated by the compiler. For GCC (GNU Compiler Collection), the -M
, -MM
, -MD
, and -MMD
flags are relevant:
-M
and-MM
generate dependency rules as a side effect of compilation.-M
includes system headers in the dependency list, while-MM
excludes them, focusing on user headers.-MD
and-MMD
are similar but generate the dependency files in parallel with compilation, not affecting the primary output.-MD
includes, and-MMD
excludes system headers.
These flags cause the compiler to analyze the source files and list all included headers, creating a .d
file alongside the object file (.o
), containing the dependency rules.
Contents Detail
In more complex systems or with more detailed flags, dependency files can include additional details:
- Conditional Dependencies: Some build systems extend the format to include conditions under which certain dependencies apply, useful for configurations where different code paths include different headers.
- Ordering Information: While not typically relevant for C/C++ compilation, some dependency formats might include order or priority information for build systems that support parallelized but ordered builds.
- Toolchain-Specific Details: Dependency files might also contain annotations or flags specific to certain compilers or build tools, particularly when special processing steps are involved (e.g., precompiled header generation).
How They Work
During a build, make
(or any build system) reads these dependency files to construct a dependency graph of the project. This graph informs the build system which files need to be recompiled when a given file changes. The build system can then ensure that all outputs are up-to-date with their inputs, minimizing the amount of compilation needed by skipping up-to-date targets.
Kernel-Specific Considerations
In the Linux kernel build system, the use of dependency files is tightly integrated with the Kbuild system. Kbuild processes dependency information to manage not just the compilation of individual modules but also the intricate relationships between different parts of the kernel codebase. This includes handling configurations where modules may depend on specific kernel configuration options or features being enabled.
Kernel dependency files might not always be in a simple .d
file format but are managed through the Kbuild system to ensure that all parts of the kernel are correctly built and linked, considering the complex configurations and variety of hardware platforms the kernel supports.
Symobol Versioning
Symbol versioning within the context of the Linux kernel is a mechanism designed to maintain binary compatibility across different kernel versions, particularly when dealing with kernel modules. This is critical because the kernel's internal API (Application Programming Interface) and ABI (Application Binary Interface) can change between versions, potentially breaking modules that depend on specific functions or symbols. Let's dive into the extreme technical detail of how symbol versioning works, focusing on Modules.symvers and Module.symvers.
Understanding Symbols in the Kernel
- Symbols: In the Linux kernel, a "symbol" refers to a function or variable's name that can be accessed outside the file it's defined in. These symbols are essential for kernel modules that need to use functions or variables defined in the kernel or other modules.
- Exported Symbols: The kernel allows modules to access certain symbols through a mechanism known as "symbol exporting." Functions like EXPORT_SYMBOL or EXPORT_SYMBOL_GPL are used within the kernel source code to mark a symbol as available for use by modules.
Symbol Versioning Mechanism
-
Purpose: Symbol versioning attaches a version code to each exported symbol. This version code represents the symbol's ABI at the time of export. When a module is compiled against a kernel, it records the version codes of all symbols it uses. At runtime, when the module is loaded, the kernel checks these version codes against its own symbols. If the version codes match, the kernel allows the module to load. If they do not match, the module loading is rejected to prevent potential instability or crashes due to ABI mismatches.
-
Modules.symvers and Module.symvers: These files play a central role in symbol versioning.
- Modules.symvers: Generated during the kernel build process, this file contains a list of all exported symbols along with their version codes. It ensures that when external modules are compiled, they can check against this list to verify symbol versions.
- Module.symvers: Similar in format to Modules.symvers, this file is generated for individual modules when they are compiled. It lists the symbols that the module exports, which can be important for modules that depend on symbols from other external modules.
The Format of Symbol Versioning Files
A typical entry in Modules.symvers or Module.symvers looks like this:
0x12345678 symbol_name modulename Export_symbol
- 0x12345678: This is the CRC (Cyclic Redundancy Check) checksum of the symbol, serving as the version code. The CRC is calculated in a way that reflects the symbol's interface, such as its function signature.
- symbol_name: The name of the symbol (function or variable).
- modulename: The name of the module that exports this symbol.
- Export_symbol: Indicates the export type, such as EXPORT_SYMBOL or EXPORT_SYMBOL_GPL.
How Symbol Versioning Prevents ABI Mismatches
- At Compile Time: When compiling a module, the build system uses Modules.symvers from the target kernel to ensure that all symbols the module uses are available and match the expected ABI version. This helps catch mismatches early, before the module is deployed.
- At Load Time: When loading a module, the kernel performs a similar check, comparing the module's Module.symvers (which reflects the symbols it needs) against the kernel's current symbols and their versions. This runtime check prevents loading modules that might be incompatible with the current kernel's ABI.
Advantages of Symbol Versioning
- Stability: By ensuring that modules only use symbols with matching ABI versions, symbol versioning helps maintain system stability across kernel updates.
- Forward Compatibility: It allows for a degree of forward compatibility, enabling modules compiled against a previous kernel version to work with newer kernels, provided the symbols' ABIs haven't changed.
Conclusion
Symbol versioning is a sophisticated mechanism that underpins the modularity and extensibility of the Linux kernel. By managing ABI compatibility through Modules.symvers and Module.symvers, the Linux kernel can support a wide ecosystem of loadable modules, maintaining system stability and preventing the common pitfalls associated with binary incompatibilities.
make oldconfig
- Kernel Configuration Step:
oldconfig
is one of the targets provided by the Linux kernel's Makefile for configuring the kernel before compilation. It's specifically designed to update an old configuration file to a new kernel version. - Functionality:
- When running
make oldconfig
, the system uses an existing.config
file (a file that contains configuration options for the kernel build) as a base. This file typically comes from a previous kernel version or a different build. - The command then goes through the new kernel's configuration options. If it finds options that are not present in the
.config
file (because they are new to this version of the kernel), it prompts the user to choose how to set these new options. For settings that already exist in.config
, it retains the previously chosen values without user intervention. - For options that are introduced in the new kernel version and are not found in the old
.config
file,oldconfig
will assign the default values specified by the kernel maintainers, without prompting the user in a non-interactive environment (useful for scripts and automated setups). - Purpose: This process ensures that the configuration file matches the requirements of the current kernel version while preserving the customizations made in previous configurations. It's a crucial step for maintaining kernel builds across version updates, particularly for systems that require a custom-configured kernel.
make scripts
The make scripts
command within the Linux kernel build process specifically targets the compilation and preparation of various scripts and host tools that are essential for the build process but not part of the kernel itself or its modules. These scripts and tools are used for a wide range of purposes, from configuring the kernel, generating documentation, to processing system call tables. Understanding its role requires diving into the structure of the kernel's build system and the function of these auxiliary components.
Overview of the Build System
The Linux kernel uses a complex build system primarily based on GNU Make, with Kbuild serving as the kernel's specific extension to this system. The build system is responsible for managing the configuration, compilation, and installation of the kernel, its modules, and associated tools. The Makefile
at the root of the kernel source tree orchestrates the build process, invoking other Makefiles and scripts as needed.
Role of scripts
Target
- Purpose: The
scripts
target compiles and prepares scripts that are necessary for the build process but not part of the final kernel binary or modules. These scripts include, but are not limited to, Kconfig parsers, documentation generators, and various helper tools that facilitate the build process or the configuration system (likemenuconfig
,xconfig
, etc.). - Components:
- Kconfig System: Tools for parsing Kconfig files and generating configuration menus (e.g.,
menuconfig
,xconfig
). These scripts are crucial for enabling developers and users to configure kernel options before the build. - Documentation Tools: Scripts that help in generating kernel documentation from source code comments, such as
kernel-doc
. - Compiler Helpers: Tools that assist in analyzing source code or preparing it for compilation. This includes
modpost
for module symbol post-processing,recordmcount
to record position ofmcount
calls for function tracing, and scripts for generating syscall tables. - Configuration Scripts: Scripts that are used to check for the presence of necessary libraries, headers, or tools on the host system, ensuring that the build environment is correctly set up.
How scripts
Works
- Execution Flow: When
make scripts
is invoked, the Makefile processes thescripts
target by executing specific rules defined in the Makefile and possibly in other Makefiles included in thescripts
directory. - Compilation: The target compiles any C source files or other compilable sources located in the
scripts
directory or its subdirectories. This might include host tools written in C that need to be executed during the build process. - Preparation: Beyond compilation,
make scripts
also involves preparing or generating certain files required for the build process. For example, it might involve generating lexer and parser code from.l
and.y
files for configuration parsing. - Installation: Some scripts or tools might be copied or linked to specific locations within the build directory to ensure they are accessible during the build process.
Technical Detail
The process of executing make scripts
involves several key steps, often detailed within the scripts/Makefile
:
- Dependency Checks: Initial checks for necessary tools or libraries on the host system.
- Compilation: Direct compilation of tools within the
scripts
directory, including, for example,kallsyms
,genksyms
,modpost
, and others. - Generation: Execution of scripts that generate source code or headers required during the build, like syscall tables or enum mappings.
- Configuration Tools: Building of tools required for kernel configuration, ensuring that they are up-to-date with the source code and ready to use.
make prepare
The make prepare
command within the Linux kernel build process is a crucial step designed to set up the kernel source tree for building. This target prepares various components of the source tree, ensuring that the necessary infrastructure is in place for a successful build process. The prepare
target is part of the kernel's Makefile system, which orchestrates the build process through a series of dependency-based rules and targets.
Overview of the Kernel Build System
The Linux kernel utilizes a sophisticated build system based on GNU Make, augmented by the kernel's own Kbuild system. This system is responsible for handling the configuration, compilation, and installation of the kernel, managing dependencies between source files, configuration options, and various build targets.
Purpose of make prepare
- Initial Setup: The
prepare
target is responsible for setting up the build environment before the actual compilation of the kernel or its modules begins. This includes preparing header files, generating version strings, and setting up configuration-dependent files. - Configuration Headers: One of the key tasks is to ensure that all necessary configuration headers are up-to-date and reflect the current configuration state. This often involves generating autoconf.h or other similar files based on the current kernel configuration (.config file).
Components of make prepare
The make prepare
target typically encompasses several sub-targets and operations, including but not limited to:
- Generating Version Strings:
-
It includes generating files that define the kernel version and build timestamps. These might be included in the kernel binary to identify the build version.
-
Preparing Configuration Headers:
-
The process generates configuration headers (e.g., autoconf.h) that encapsulate the configuration options selected during the kernel configuration process (
make menuconfig
,make oldconfig
, etc.). These headers are used throughout the kernel code to include or exclude code sections based on the configuration. -
Symbol Versioning Preparation:
-
For modules, it ensures that symbol version files (Modules.symvers) are present if needed. These files are critical for maintaining compatibility between the kernel and its loadable modules, especially in terms of API/ABI consistency.
-
Architecture-Specific Preparation:
-
The kernel supports multiple architectures (x86, ARM, MIPS, etc.), and
make prepare
involves running architecture-specific preparation steps. This might include generating assembly headers, setting up architecture-specific configurations, or other preparatory tasks required by the target architecture. -
Header Files Cleanup and Preparation:
-
This includes cleaning up and preparing user-space API headers and other necessary header files. The process ensures that exported headers are consistent with the current kernel configuration and ready for use by user-space applications or kernel modules.
-
Preparing Scripts:
- Although scripts are often prepared with
make scripts
, some preparatory steps might involve ensuring that essential build tools or scripts are ready for use during the compilation process.
How make prepare
Works
- Execution Flow: When invoked,
make
processes theprepare
target by executing the specified rules in the Makefile. This includes dependencies on other targets that must be completed as part of the preparation process. - Dependency Resolution: The Makefile system determines the order in which sub-targets and operations are executed, based on dependencies. This ensures that all preparatory steps are completed in the correct sequence.
- Sub-targets Execution: The
prepare
target may invoke several sub-targets, each responsible for different aspects of the preparation process. These are executed as needed, based on the current state of the source tree and the configuration options selected.