In the first part of this blog post series on Linux kernel initcalls, we looked at their purpose, their usage, and ways to debug them (using initcall_debug or FTrace). In this second part, we’ll go deeper into the implementation of initcalls, with a look at the colorful __device_initcall() macro, the rootfs initcall, and how modules can be executed.

If you haven’t already read part 1, I highly recommend reading it before continuing.

Now, let’s begin. Here’s a reminder of what we learned in part 1:

  • an initcall is defined by an ID: 2 in case of a postcore
  • an initcall definition leads to an final __define_initcall(). This is what we will now be focusing on.

Implementation

DEFINE_INITCALL

If we use our dummy example, the postcore_initcall() leads to a first __define_initcall() (with an ID of 2) leading to another ___define_initcall() having 3 arguments. Here is a summary of the state of our previous article:

Now, let’s expand line by line this final ___define_initcall() macro:

This define_initcall() function is using the following parameters:

  • the initcall’s function name we want to create (mypostcore_init in our case)
  • the initcall ID (2 for postcore)
  • the section that will be used in the object-file section (.initcall2).

All these parameters will be used to create an initcall_t entry that will be named according to the given parameters. In our example, __initcall_mypostcore_init2.

The use of keyword attribute and section will allow us to name an object file section. It will be .initcall2.init in case of a postcore initcall. It will be the same for all postcore-initcalls, all grouped in sections .initcall2.init.

Using objdump will confirm that. It is possible to look at fresh kernel object file and search for our function’s name:

$ objdump -t vmlinux.o | grep postcore_init2
000007c l O .initcall2.init 0000004 __initcall_mypostcore_init2

We have a section .initcall2.init refering to our entry __initcall_postcore_init2 leading to our postcore dummy example. To summarize, the __define_initcall function will create an object-file section specific to the initcall used (thanks to its id) pointing to the function created.

If we look at all existing initcall2 (i.e. postcore initcalls), we can see that the address of each function pointers follow each others:

$ objdump -t vmlinux.o | grep .initcall2.init
00000000 l     O .initcall2.init	00000004 __initcall_atomic_pool_init2
00000004 l     O .initcall2.init	00000004 __initcall_mvebu_soc_device2
00000008 l     O .initcall2.init	00000004 __initcall_coherency_late_init2
0000000c l     O .initcall2.init	00000004 __initcall_imx_mmdc_init2
00000010 l     O .initcall2.init	00000004 __initcall_omap_hwmod_setup_all2
[...]
0000007c l     O .initcall2.init	00000004 __initcall_mypostcore_init2
00000080 l     O .initcall2.init	00000004 __initcall_rockchip_grf_init2
[...]

This initcall2.init section contains function addresses of all postcore’s initcalls registered. The order is performed on compilation time, depending on the order in Makefiles.

Level-initcalls ordering: Makefile!

Let’s execute an example to prove that the ordering between all the initcalls of one level is performed via the ordering in the Makefile and not according to any other way (alphabetic order, …).

  • We create 2 dummy examples as postcore initcalls: mydriver.c containing mydriver_func() initcall and myotherdriver.c containing myotherdriver_func() initcall. Let’s put these two drivers in RTC subsystem (of course, it could be anywhere else):
    $ cat drivers/rtc/mydriver.c
    #include 
    
    static int __init mydriver_func(void)
    {
    	return 0;
    }
    postcore_initcall(mydriver_func);
    
    $ cat drivers/rtc/myotherdriver.c
    #include 
    
    static int __init myotherdriver_func(void)
    {
    	return 0;
    }
    postcore_initcall(myotherdriver_func);
    
  • We will put mydriver as being the first compiled and then, myotherdriver:
    $ git diff drivers/rtc/Makefile
    [...]
    -rtc-core-y                     := class.o interface.o
    +rtc-core-y                     := class.o interface.o mydriver.o myotherdriver.o
    
  • After a compilation, let’s look at the object-file:
    $ objdump -t vmlinux.o | grep "driver_func"
    0008c3c8 l     F .init.text	00000008 mydriver_func
    000000c8 l     O .initcall2.init	00000004 __initcall_mydriver_func2
    0008c3d0 l     F .init.text	00000008 myotherdriver_func
    000000cc l     O .initcall2.init	00000004 __initcall_myotherdriver_func2
    

As you can see, the address of the section is different depending on the function name: 000000c8 for __initcall_mydriver_func2 and 000000cc for __initcall_myotherdriver_func2. The address of __initcall_mydriver_func2 is before the one for __initcall_myotherdriver_func2.

  • Finally, let’s check the execution order by using FTrace:
    # cat /sys/kernel/debug/tracing/trace | grep driver_func
           swapper/0-1     [000] ....     0.059546: initcall_start: func=mydriver_func+0x0/0x8
           swapper/0-1     [000] ....     0.059556: initcall_finish: func=mydriver_func+0x0/0x8 ret=0
           swapper/0-1     [000] ....     0.059571: initcall_start: func=myotherdriver_func+0x0/0x8
           swapper/0-1     [000] ....     0.059581: initcall_finish: func=myotherdriver_func+0x0/0x8 ret=0
    

mydriver_func is executed before myotherdriver_func.

  • Now, let’s invert the order only in the Makefile and let’s reproduce the exact same test:
    $ git diff drivers/rtc/Makefile
    [...]
    -rtc-core-y                     := class.o interface.o
    +rtc-core-y                     := class.o interface.o myotherdriver.o mydriver.o
    
  • The section in vmlinux.o is also inverted:
    $ objdump -t vmlinux.o | grep "driver_func"
    0008c3c8 l     F .init.text	00000008 myotherdriver_func
    000000c8 l     O .initcall2.init	00000004 __initcall_myotherdriver_func2
    0008c3d0 l     F .init.text	00000008 mydriver_func
    000000cc l     O .initcall2.init	00000004 __initcall_mydriver_func2
    
  • and the execution of the functions too:
    # cat /sys/kernel/debug/tracing/trace | grep driver_func
           swapper/0-1     [000] ....     0.059520: initcall_start: func=myotherdriver_func+0x0/0x8
           swapper/0-1     [000] ....     0.059530: initcall_finish: func=myotherdriver_func+0x0/0x8 ret=0
           swapper/0-1     [000] ....     0.059545: initcall_start: func=mydriver_func+0x0/0x8
           swapper/0-1     [000] ....     0.059555: initcall_finish: func=mydriver_func+0x0/0x8 ret=0
    

So far, we know that creating a function as an initcall will create in each driver a section specific to the level of the initcall (postcore_initcall => .initcall2.init) and each initcall for this particular level will be ordered in the final Kernel image according to Makefile ordering.

But how is the kernel ordering all the initcall levels between themselves? When is a postcore initcall executed relative to the other initcalls? How is it handled? Let’s find out…

Initcall functions

If you remember, each type of initcall has an ID. This is the key of the ordering. After the above part, we know that each type of initcall will have different section’s name according to its ID: .initcall1.init, .initcall2.init, etc

The main implementation of initcall ordering is done in init/main.c. Yes, really, you are looking at init/main.c in Linux Kernel’s code!

The initcall_levels is an array where each entry is a pointer for this particular level. initcall_levels[] contains different __initcall_start.

extern initcall_entry_t __initcall_start[];
extern initcall_entry_t __initcall0_start[];
extern initcall_entry_t __initcall1_start[];
extern initcall_entry_t __initcall2_start[];
extern initcall_entry_t __initcall3_start[];
extern initcall_entry_t __initcall4_start[];
extern initcall_entry_t __initcall5_start[];
extern initcall_entry_t __initcall6_start[];
extern initcall_entry_t __initcall7_start[];
extern initcall_entry_t __initcall_end[];

static initcall_entry_t *initcall_levels[] __initdata = {
        __initcall0_start,
        __initcall1_start,
        __initcall2_start,
        __initcall3_start,
        __initcall4_start,
        __initcall5_start,
        __initcall6_start,
        __initcall7_start,
        __initcall_end,
};

We already know that initcalls is a mechanism to place chosen functions in specific object file sections. Those will be iterated over at boot time. To do that the kernel must somehow know where they actually are. This is achieved with the linker using a script which creates the __initcall_start symbols (include/asm-generic/vmlinux.lds.h):

#define INIT_CALLS_LEVEL(level)                   
                __initcall##level##_start = .;    
                KEEP(*(.initcall##level##.init))  
                KEEP(*(.initcall##level##s.init)) 

After compilation, the resulting linker script (arch/arm/kernel/vmlinux.lds) looks like:

.init.data : AT(ADDR(.init.data) - 0)

__initcall_start = .; 			KEEP(*(.initcallearly.init))
__initcall0_start = .; 		KEEP(*(.initcall0.init))
__initcall1_start = .; 		KEEP(*(.initcall1.init))
__initcall2_start = .; 		KEEP(*(.initcall2.init))
__initcall3_start = .; 		KEEP(*(.initcall3.init))
__initcall4_start = .; 		KEEP(*(.initcall4.init))
__initcall5_start = .; 		KEEP(*(.initcall5.init))
__initcallrootfs_start = .; 	KEEP(*(.initcallrootfs.init))
__initcall6_start = .; 		KEEP(*(.initcall6.init))
__initcall7_start = .; 		KEEP(*(.initcall7.init))
__initcall_end = .

Without being a linker script expert, we can assume that the __initcall2_start entry points the first address of .initcall2.init section in object file.

The main function that will process all the possible initcall levels is called do_initcalls() and is available in init/main.c:

static void __init do_basic_setup(void)
{
	[...]
	do_initcalls();
}

static void __init do_initcalls(void)
{
	int level;
	[...]

	for (level = 0; level < ARRAY_SIZE(initcall_levels)–1;level++) {
		[...]
		do_initcall_level(level, command_line);
	}
}

This function is handling all the levels from this array. A quick word about command_line parameter that is only a copy of usual command-line which can contains parameters for modules. This function is calling another function do_initcall_level where the code (simplified) is the following:

static void __init do_initcall_level(int level,char *command_line)
{
	initcall_entry_t *fn;
	[...]
	for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
		do_one_initcall(initcall_from_entry(fn));
}

This above function (do_initcall_level) is calling all the initcalls for a particular level thanks to the function do_one_initcall. Thanks to this for-loop on the initcall_entry_t, it will execute through the do_one_initcall function the address of the said section which contains function pointers stored sequentially. In other words, during this for loop, the first value of the fn is the address given by __initcall2_start (which corresponds to the first .initcall2.init section found). All sections are organized according to their order in the Makefiles. This for-loop will iterate on all the addresses (fn++). This code is passing parameters for all of the addresses after iterating all initcall2.init section:

$ objdump -t vmlinux.o | grep .initcall2.init
00000000 l     O .initcall2.init	00000004 __initcall_atomic_pool_init2
00000004 l     O .initcall2.init	00000004 __initcall_mvebu_soc_device2
00000008 l     O .initcall2.init	00000004 __initcall_coherency_late_init2
0000000c l     O .initcall2.init	00000004 __initcall_imx_mmdc_init2
00000010 l     O .initcall2.init	00000004 __initcall_omap_hwmod_setup_all2
[...]
0000007c l     O .initcall2.init	00000004 __initcall_mypostcore_init2
00000080 l     O .initcall2.init	00000004 __initcall_rockchip_grf_init2
[...]

In the above example, the values of fn would be:

  • 1st iteration: fn equals __initcall2_start which correspond to the address of .initcall2.init=00000000 => __initcall_atomic_pool_init2
  • 2nd iteration: fn equals next address of .initcall2.init=00000004 => __initcall_mvebu_soc_device2
  • 3rd iteration: fn equals next address of .initcall2.init=00000008 => __initcall_coherency_late_init2
  • and so on until it reaches the end of level 2.
int __init_or_module do_one_initcall(initcall_t fn) {
		int ret;
		[...]

		do_trace_initcall_start(fn);
		ret = fn();
		do_trace_initcall_finish(fn, ret);
		[...]

		return ret;
}

The code above has two important points:

  • a use of start/finish trace functions (see Debugging section of the first post about initcalls).
  • the execution of the initcall_t which corresponds to the function created by the user.

To summarize, initcall_levels is an array with a list of initcall_start for all initcalls levels. They correspond to the first address, the first .initcall.init section that will be used for each level. Take again the example of postcore_initcall. The first initcall2.init compiled (depending on the Makefile ordering) will have the same address than the address pointed by initcall2_start. In do_one_initcall(), it will be the first function executed. Then, with the for-loop from do_initcall_level(), it will go to the next function pointer's address (thanks to fn++) and so one until it reaches the end of all initcall2. And then, thanks to do_initcalls(), it will go to the next level i.e. initcall3.

Rootfs initcall

If you look at all initcalls definitions, everything is based on an ID. '2' in case of postcore_initcall() but the ID is a string rootfs in the case of a rootfs_initcall(). Let's have a look at this particular initcall.

In the init folder, we can notice that it is mainly to mount a rootfs, either from an initramfs or a block device.

$ git grep rootfs_initcall init/
init/initramfs.c:rootfs_initcall(populate_rootfs);
init/noinitramfs.c:rootfs_initcall(default_rootfs);

According to what we have seen previously, we will have a object-file section with the corresponding function pointer depending if an initial RAM filesytem support is enabled or not in our kernel's configuration.

$ objdump -t vmlinux.o | grep .initcallrootfs
00000000 l    d  .initcallrootfs.init	00000000 .initcallrootfs.init
00000000 l       .initcallrootfs.init	00000000 $d
00000000 l     O .initcallrootfs.init	00000004 __initcall_populate_rootfsrootfs

What about modules?

If you remember in the previous part of this blog post series about initcalls, using module_init() allows modules to be executed as device_initcall in case they are compiled builtin. In the case of a loadable module, the function will be executed at the module's insertion. The code is the following:

#define early_initcall(fn)		module_init(fn)
#define core_initcall(fn)		module_init(fn)
#define postcore_initcall(fn)		module_init(fn)
#define arch_initcall(fn)		module_init(fn)
#define subsys_initcall(fn)		module_init(fn)
#define fs_initcall(fn)			module_init(fn)
#define rootfs_initcall(fn)		module_init(fn)
#define device_initcall(fn)		module_init(fn)
#define late_initcall(fn)		module_init(fn)

#define console_initcall(fn)		module_init(fn)

/* Each module must use one module_init(). */
#define module_init(initfn)					
	static inline initcall_t __maybe_unused __inittest(void)		
	{ return initfn; }					
	int init_module(void) __copy(initfn) __attribute__((alias(#initfn)));

We have already seen the case of a non-loadable module (i.e. #ifndef MODULE in part 1) so let's quickly look at the case of a module that can be loadable. All the initcalls are replaced by one single definition: module_init(). This macro is creating init_module as an alias to our function. For module, an additional part of code is added to add the init_module alias to the .init field of the structure module. A function do_init_module() is called on insertion time via syscalls. If you look closer, this function is using a function that we already talked about:

static noinline int do_init_module(struct module *mod)
{
[...]
        /* Start the module */
        if (mod->init != NULL)
                ret = do_one_initcall(mod->init);
[...]

This funtion is using our previous do_one_initcall() function with mod->init as the initcall's function to execute! Thanks to additional code handled by some modpost scripts, .init = init_module and init_module is an alias to our function.

To sum-up, when loading a loadable module, the syscall which initializes module's insertion is calling the function passed in module_init() as a initcall. To make it more generic, it is using an alias (init_module) to point to this specific function and using an init field to module's structure. Thanks to the syscall mechanism, it means that when you are loading a module, the syscall will execute do_init_module() which will execute our function directly by using the existing do_one_initcall().

Conclusion

To avoid writing again about all the mechanisms we have seen around the implementation of initcalls, I will conclude with a drawing to sum-up the interactions/implementation.

And, that's it! We have seen a lot of stuff with these two articles about initcalls. I hope you enjoyed reading this as much as I enjoyed to writing it. And it is so cool to look at the main.c of the Linux Kernel, right?!

Read More

ترك الرد

من فضلك ادخل تعليقك
من فضلك ادخل اسمك هنا