|
@@ -0,0 +1,412 @@
|
|
|
+Linux and the Device Tree
|
|
|
+-------------------------
|
|
|
+The Linux usage model for device tree data
|
|
|
+
|
|
|
+Author: Grant Likely <grant.likely@secretlab.ca>
|
|
|
+
|
|
|
+This article describes how Linux uses the device tree. An overview of
|
|
|
+the device tree data format can be found on the device tree usage page
|
|
|
+at devicetree.org[1].
|
|
|
+
|
|
|
+[1] http://devicetree.org/Device_Tree_Usage
|
|
|
+
|
|
|
+The "Open Firmware Device Tree", or simply Device Tree (DT), is a data
|
|
|
+structure and language for describing hardware. More specifically, it
|
|
|
+is a description of hardware that is readable by an operating system
|
|
|
+so that the operating system doesn't need to hard code details of the
|
|
|
+machine.
|
|
|
+
|
|
|
+Structurally, the DT is a tree, or acyclic graph with named nodes, and
|
|
|
+nodes may have an arbitrary number of named properties encapsulating
|
|
|
+arbitrary data. A mechanism also exists to create arbitrary
|
|
|
+links from one node to another outside of the natural tree structure.
|
|
|
+
|
|
|
+Conceptually, a common set of usage conventions, called 'bindings',
|
|
|
+is defined for how data should appear in the tree to describe typical
|
|
|
+hardware characteristics including data busses, interrupt lines, GPIO
|
|
|
+connections, and peripheral devices.
|
|
|
+
|
|
|
+As much as possible, hardware is described using existing bindings to
|
|
|
+maximize use of existing support code, but since property and node
|
|
|
+names are simply text strings, it is easy to extend existing bindings
|
|
|
+or create new ones by defining new nodes and properties. Be wary,
|
|
|
+however, of creating a new binding without first doing some homework
|
|
|
+about what already exists. There are currently two different,
|
|
|
+incompatible, bindings for i2c busses that came about because the new
|
|
|
+binding was created without first investigating how i2c devices were
|
|
|
+already being enumerated in existing systems.
|
|
|
+
|
|
|
+1. History
|
|
|
+----------
|
|
|
+The DT was originally created by Open Firmware as part of the
|
|
|
+communication method for passing data from Open Firmware to a client
|
|
|
+program (like to an operating system). An operating system used the
|
|
|
+Device Tree to discover the topology of the hardware at runtime, and
|
|
|
+thereby support a majority of available hardware without hard coded
|
|
|
+information (assuming drivers were available for all devices).
|
|
|
+
|
|
|
+Since Open Firmware is commonly used on PowerPC and SPARC platforms,
|
|
|
+the Linux support for those architectures has for a long time used the
|
|
|
+Device Tree.
|
|
|
+
|
|
|
+In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit
|
|
|
+and 64-bit support, the decision was made to require DT support on all
|
|
|
+powerpc platforms, regardless of whether or not they used Open
|
|
|
+Firmware. To do this, a DT representation called the Flattened Device
|
|
|
+Tree (FDT) was created which could be passed to the kernel as a binary
|
|
|
+blob without requiring a real Open Firmware implementation. U-Boot,
|
|
|
+kexec, and other bootloaders were modified to support both passing a
|
|
|
+Device Tree Binary (dtb) and to modify a dtb at boot time. DT was
|
|
|
+also added to the PowerPC boot wrapper (arch/powerpc/boot/*) so that
|
|
|
+a dtb could be wrapped up with the kernel image to support booting
|
|
|
+existing non-DT aware firmware.
|
|
|
+
|
|
|
+Some time later, FDT infrastructure was generalized to be usable by
|
|
|
+all architectures. At the time of this writing, 6 mainlined
|
|
|
+architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1
|
|
|
+out of mainline (nios) have some level of DT support.
|
|
|
+
|
|
|
+2. Data Model
|
|
|
+-------------
|
|
|
+If you haven't already read the Device Tree Usage[1] page,
|
|
|
+then go read it now. It's okay, I'll wait....
|
|
|
+
|
|
|
+2.1 High Level View
|
|
|
+-------------------
|
|
|
+The most important thing to understand is that the DT is simply a data
|
|
|
+structure that describes the hardware. There is nothing magical about
|
|
|
+it, and it doesn't magically make all hardware configuration problems
|
|
|
+go away. What it does do is provide a language for decoupling the
|
|
|
+hardware configuration from the board and device driver support in the
|
|
|
+Linux kernel (or any other operating system for that matter). Using
|
|
|
+it allows board and device support to become data driven; to make
|
|
|
+setup decisions based on data passed into the kernel instead of on
|
|
|
+per-machine hard coded selections.
|
|
|
+
|
|
|
+Ideally, data driven platform setup should result in less code
|
|
|
+duplication and make it easier to support a wide range of hardware
|
|
|
+with a single kernel image.
|
|
|
+
|
|
|
+Linux uses DT data for three major purposes:
|
|
|
+1) platform identification,
|
|
|
+2) runtime configuration, and
|
|
|
+3) device population.
|
|
|
+
|
|
|
+2.2 Platform Identification
|
|
|
+---------------------------
|
|
|
+First and foremost, the kernel will use data in the DT to identify the
|
|
|
+specific machine. In a perfect world, the specific platform shouldn't
|
|
|
+matter to the kernel because all platform details would be described
|
|
|
+perfectly by the device tree in a consistent and reliable manner.
|
|
|
+Hardware is not perfect though, and so the kernel must identify the
|
|
|
+machine during early boot so that it has the opportunity to run
|
|
|
+machine-specific fixups.
|
|
|
+
|
|
|
+In the majority of cases, the machine identity is irrelevant, and the
|
|
|
+kernel will instead select setup code based on the machine's core
|
|
|
+CPU or SoC. On ARM for example, setup_arch() in
|
|
|
+arch/arm/kernel/setup.c will call setup_machine_fdt() in
|
|
|
+arch/arm/kernel/devicetree.c which searches through the machine_desc
|
|
|
+table and selects the machine_desc which best matches the device tree
|
|
|
+data. It determines the best match by looking at the 'compatible'
|
|
|
+property in the root device tree node, and comparing it with the
|
|
|
+dt_compat list in struct machine_desc.
|
|
|
+
|
|
|
+The 'compatible' property contains a sorted list of strings starting
|
|
|
+with the exact name of the machine, followed by an optional list of
|
|
|
+boards it is compatible with sorted from most compatible to least. For
|
|
|
+example, the root compatible properties for the TI BeagleBoard and its
|
|
|
+successor, the BeagleBoard xM board might look like:
|
|
|
+
|
|
|
+ compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3";
|
|
|
+ compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3";
|
|
|
+
|
|
|
+Where "ti,omap3-beagleboard-xm" specifies the exact model, it also
|
|
|
+claims that it compatible with the OMAP 3450 SoC, and the omap3 family
|
|
|
+of SoCs in general. You'll notice that the list is sorted from most
|
|
|
+specific (exact board) to least specific (SoC family).
|
|
|
+
|
|
|
+Astute readers might point out that the Beagle xM could also claim
|
|
|
+compatibility with the original Beagle board. However, one should be
|
|
|
+cautioned about doing so at the board level since there is typically a
|
|
|
+high level of change from one board to another, even within the same
|
|
|
+product line, and it is hard to nail down exactly what is meant when one
|
|
|
+board claims to be compatible with another. For the top level, it is
|
|
|
+better to err on the side of caution and not claim one board is
|
|
|
+compatible with another. The notable exception would be when one
|
|
|
+board is a carrier for another, such as a CPU module attached to a
|
|
|
+carrier board.
|
|
|
+
|
|
|
+One more note on compatible values. Any string used in a compatible
|
|
|
+property must be documented as to what it indicates. Add
|
|
|
+documentation for compatible strings in Documentation/devicetree/bindings.
|
|
|
+
|
|
|
+Again on ARM, for each machine_desc, the kernel looks to see if
|
|
|
+any of the dt_compat list entries appear in the compatible property.
|
|
|
+If one does, then that machine_desc is a candidate for driving the
|
|
|
+machine. After searching the entire table of machine_descs,
|
|
|
+setup_machine_fdt() returns the 'most compatible' machine_desc based
|
|
|
+on which entry in the compatible property each machine_desc matches
|
|
|
+against. If no matching machine_desc is found, then it returns NULL.
|
|
|
+
|
|
|
+The reasoning behind this scheme is the observation that in the majority
|
|
|
+of cases, a single machine_desc can support a large number of boards
|
|
|
+if they all use the same SoC, or same family of SoCs. However,
|
|
|
+invariably there will be some exceptions where a specific board will
|
|
|
+require special setup code that is not useful in the generic case.
|
|
|
+Special cases could be handled by explicitly checking for the
|
|
|
+troublesome board(s) in generic setup code, but doing so very quickly
|
|
|
+becomes ugly and/or unmaintainable if it is more than just a couple of
|
|
|
+cases.
|
|
|
+
|
|
|
+Instead, the compatible list allows a generic machine_desc to provide
|
|
|
+support for a wide common set of boards by specifying "less
|
|
|
+compatible" value in the dt_compat list. In the example above,
|
|
|
+generic board support can claim compatibility with "ti,omap3" or
|
|
|
+"ti,omap3450". If a bug was discovered on the original beagleboard
|
|
|
+that required special workaround code during early boot, then a new
|
|
|
+machine_desc could be added which implements the workarounds and only
|
|
|
+matches on "ti,omap3-beagleboard".
|
|
|
+
|
|
|
+PowerPC uses a slightly different scheme where it calls the .probe()
|
|
|
+hook from each machine_desc, and the first one returning TRUE is used.
|
|
|
+However, this approach does not take into account the priority of the
|
|
|
+compatible list, and probably should be avoided for new architecture
|
|
|
+support.
|
|
|
+
|
|
|
+2.3 Runtime configuration
|
|
|
+-------------------------
|
|
|
+In most cases, a DT will be the sole method of communicating data from
|
|
|
+firmware to the kernel, so also gets used to pass in runtime and
|
|
|
+configuration data like the kernel parameters string and the location
|
|
|
+of an initrd image.
|
|
|
+
|
|
|
+Most of this data is contained in the /chosen node, and when booting
|
|
|
+Linux it will look something like this:
|
|
|
+
|
|
|
+ chosen {
|
|
|
+ bootargs = "console=ttyS0,115200 loglevel=8";
|
|
|
+ initrd-start = <0xc8000000>;
|
|
|
+ initrd-end = <0xc8200000>;
|
|
|
+ };
|
|
|
+
|
|
|
+The bootargs property contains the kernel arguments, and the initrd-*
|
|
|
+properties define the address and size of an initrd blob. The
|
|
|
+chosen node may also optionally contain an arbitrary number of
|
|
|
+additional properties for platform-specific configuration data.
|
|
|
+
|
|
|
+During early boot, the architecture setup code calls of_scan_flat_dt()
|
|
|
+several times with different helper callbacks to parse device tree
|
|
|
+data before paging is setup. The of_scan_flat_dt() code scans through
|
|
|
+the device tree and uses the helpers to extract information required
|
|
|
+during early boot. Typically the early_init_dt_scan_chosen() helper
|
|
|
+is used to parse the chosen node including kernel parameters,
|
|
|
+early_init_dt_scan_root() to initialize the DT address space model,
|
|
|
+and early_init_dt_scan_memory() to determine the size and
|
|
|
+location of usable RAM.
|
|
|
+
|
|
|
+On ARM, the function setup_machine_fdt() is responsible for early
|
|
|
+scanning of the device tree after selecting the correct machine_desc
|
|
|
+that supports the board.
|
|
|
+
|
|
|
+2.4 Device population
|
|
|
+---------------------
|
|
|
+After the board has been identified, and after the early configuration data
|
|
|
+has been parsed, then kernel initialization can proceed in the normal
|
|
|
+way. At some point in this process, unflatten_device_tree() is called
|
|
|
+to convert the data into a more efficient runtime representation.
|
|
|
+This is also when machine-specific setup hooks will get called, like
|
|
|
+the machine_desc .init_early(), .init_irq() and .init_machine() hooks
|
|
|
+on ARM. The remainder of this section uses examples from the ARM
|
|
|
+implementation, but all architectures will do pretty much the same
|
|
|
+thing when using a DT.
|
|
|
+
|
|
|
+As can be guessed by the names, .init_early() is used for any machine-
|
|
|
+specific setup that needs to be executed early in the boot process,
|
|
|
+and .init_irq() is used to set up interrupt handling. Using a DT
|
|
|
+doesn't materially change the behaviour of either of these functions.
|
|
|
+If a DT is provided, then both .init_early() and .init_irq() are able
|
|
|
+to call any of the DT query functions (of_* in include/linux/of*.h) to
|
|
|
+get additional data about the platform.
|
|
|
+
|
|
|
+The most interesting hook in the DT context is .init_machine() which
|
|
|
+is primarily responsible for populating the Linux device model with
|
|
|
+data about the platform. Historically this has been implemented on
|
|
|
+embedded platforms by defining a set of static clock structures,
|
|
|
+platform_devices, and other data in the board support .c file, and
|
|
|
+registering it en-masse in .init_machine(). When DT is used, then
|
|
|
+instead of hard coding static devices for each platform, the list of
|
|
|
+devices can be obtained by parsing the DT, and allocating device
|
|
|
+structures dynamically.
|
|
|
+
|
|
|
+The simplest case is when .init_machine() is only responsible for
|
|
|
+registering a block of platform_devices. A platform_device is a concept
|
|
|
+used by Linux for memory or I/O mapped devices which cannot be detected
|
|
|
+by hardware, and for 'composite' or 'virtual' devices (more on those
|
|
|
+later). While there is no 'platform device' terminology for the DT,
|
|
|
+platform devices roughly correspond to device nodes at the root of the
|
|
|
+tree and children of simple memory mapped bus nodes.
|
|
|
+
|
|
|
+About now is a good time to lay out an example. Here is part of the
|
|
|
+device tree for the NVIDIA Tegra board.
|
|
|
+
|
|
|
+/{
|
|
|
+ compatible = "nvidia,harmony", "nvidia,tegra20";
|
|
|
+ #address-cells = <1>;
|
|
|
+ #size-cells = <1>;
|
|
|
+ interrupt-parent = <&intc>;
|
|
|
+
|
|
|
+ chosen { };
|
|
|
+ aliases { };
|
|
|
+
|
|
|
+ memory {
|
|
|
+ device_type = "memory";
|
|
|
+ reg = <0x00000000 0x40000000>;
|
|
|
+ };
|
|
|
+
|
|
|
+ soc {
|
|
|
+ compatible = "nvidia,tegra20-soc", "simple-bus";
|
|
|
+ #address-cells = <1>;
|
|
|
+ #size-cells = <1>;
|
|
|
+ ranges;
|
|
|
+
|
|
|
+ intc: interrupt-controller@50041000 {
|
|
|
+ compatible = "nvidia,tegra20-gic";
|
|
|
+ interrupt-controller;
|
|
|
+ #interrupt-cells = <1>;
|
|
|
+ reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >;
|
|
|
+ };
|
|
|
+
|
|
|
+ serial@70006300 {
|
|
|
+ compatible = "nvidia,tegra20-uart";
|
|
|
+ reg = <0x70006300 0x100>;
|
|
|
+ interrupts = <122>;
|
|
|
+ };
|
|
|
+
|
|
|
+ i2s1: i2s@70002800 {
|
|
|
+ compatible = "nvidia,tegra20-i2s";
|
|
|
+ reg = <0x70002800 0x100>;
|
|
|
+ interrupts = <77>;
|
|
|
+ codec = <&wm8903>;
|
|
|
+ };
|
|
|
+
|
|
|
+ i2c@7000c000 {
|
|
|
+ compatible = "nvidia,tegra20-i2c";
|
|
|
+ #address-cells = <1>;
|
|
|
+ #size-cells = <0>;
|
|
|
+ reg = <0x7000c000 0x100>;
|
|
|
+ interrupts = <70>;
|
|
|
+
|
|
|
+ wm8903: codec@1a {
|
|
|
+ compatible = "wlf,wm8903";
|
|
|
+ reg = <0x1a>;
|
|
|
+ interrupts = <347>;
|
|
|
+ };
|
|
|
+ };
|
|
|
+ };
|
|
|
+
|
|
|
+ sound {
|
|
|
+ compatible = "nvidia,harmony-sound";
|
|
|
+ i2s-controller = <&i2s1>;
|
|
|
+ i2s-codec = <&wm8903>;
|
|
|
+ };
|
|
|
+};
|
|
|
+
|
|
|
+At .machine_init() time, Tegra board support code will need to look at
|
|
|
+this DT and decide which nodes to create platform_devices for.
|
|
|
+However, looking at the tree, it is not immediately obvious what kind
|
|
|
+of device each node represents, or even if a node represents a device
|
|
|
+at all. The /chosen, /aliases, and /memory nodes are informational
|
|
|
+nodes that don't describe devices (although arguably memory could be
|
|
|
+considered a device). The children of the /soc node are memory mapped
|
|
|
+devices, but the codec@1a is an i2c device, and the sound node
|
|
|
+represents not a device, but rather how other devices are connected
|
|
|
+together to create the audio subsystem. I know what each device is
|
|
|
+because I'm familiar with the board design, but how does the kernel
|
|
|
+know what to do with each node?
|
|
|
+
|
|
|
+The trick is that the kernel starts at the root of the tree and looks
|
|
|
+for nodes that have a 'compatible' property. First, it is generally
|
|
|
+assumed that any node with a 'compatible' property represents a device
|
|
|
+of some kind, and second, it can be assumed that any node at the root
|
|
|
+of the tree is either directly attached to the processor bus, or is a
|
|
|
+miscellaneous system device that cannot be described any other way.
|
|
|
+For each of these nodes, Linux allocates and registers a
|
|
|
+platform_device, which in turn may get bound to a platform_driver.
|
|
|
+
|
|
|
+Why is using a platform_device for these nodes a safe assumption?
|
|
|
+Well, for the way that Linux models devices, just about all bus_types
|
|
|
+assume that its devices are children of a bus controller. For
|
|
|
+example, each i2c_client is a child of an i2c_master. Each spi_device
|
|
|
+is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The
|
|
|
+same hierarchy is also found in the DT, where I2C device nodes only
|
|
|
+ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB,
|
|
|
+etc. The only devices which do not require a specific type of parent
|
|
|
+device are platform_devices (and amba_devices, but more on that
|
|
|
+later), which will happily live at the base of the Linux /sys/devices
|
|
|
+tree. Therefore, if a DT node is at the root of the tree, then it
|
|
|
+really probably is best registered as a platform_device.
|
|
|
+
|
|
|
+Linux board support code calls of_platform_populate(NULL, NULL, NULL)
|
|
|
+to kick off discovery of devices at the root of the tree. The
|
|
|
+parameters are all NULL because when starting from the root of the
|
|
|
+tree, there is no need to provide a starting node (the first NULL), a
|
|
|
+parent struct device (the last NULL), and we're not using a match
|
|
|
+table (yet). For a board that only needs to register devices,
|
|
|
+.init_machine() can be completely empty except for the
|
|
|
+of_platform_populate() call.
|
|
|
+
|
|
|
+In the Tegra example, this accounts for the /soc and /sound nodes, but
|
|
|
+what about the children of the SoC node? Shouldn't they be registered
|
|
|
+as platform devices too? For Linux DT support, the generic behaviour
|
|
|
+is for child devices to be registered by the parent's device driver at
|
|
|
+driver .probe() time. So, an i2c bus device driver will register a
|
|
|
+i2c_client for each child node, an SPI bus driver will register
|
|
|
+its spi_device children, and similarly for other bus_types.
|
|
|
+According to that model, a driver could be written that binds to the
|
|
|
+SoC node and simply registers platform_devices for each of its
|
|
|
+children. The board support code would allocate and register an SoC
|
|
|
+device, a (theoretical) SoC device driver could bind to the SoC device,
|
|
|
+and register platform_devices for /soc/interrupt-controller, /soc/serial,
|
|
|
+/soc/i2s, and /soc/i2c in its .probe() hook. Easy, right?
|
|
|
+
|
|
|
+Actually, it turns out that registering children of some
|
|
|
+platform_devices as more platform_devices is a common pattern, and the
|
|
|
+device tree support code reflects that and makes the above example
|
|
|
+simpler. The second argument to of_platform_populate() is an
|
|
|
+of_device_id table, and any node that matches an entry in that table
|
|
|
+will also get its child nodes registered. In the tegra case, the code
|
|
|
+can look something like this:
|
|
|
+
|
|
|
+static void __init harmony_init_machine(void)
|
|
|
+{
|
|
|
+ /* ... */
|
|
|
+ of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
|
|
|
+}
|
|
|
+
|
|
|
+"simple-bus" is defined in the ePAPR 1.0 specification as a property
|
|
|
+meaning a simple memory mapped bus, so the of_platform_populate() code
|
|
|
+could be written to just assume simple-bus compatible nodes will
|
|
|
+always be traversed. However, we pass it in as an argument so that
|
|
|
+board support code can always override the default behaviour.
|
|
|
+
|
|
|
+[Need to add discussion of adding i2c/spi/etc child devices]
|
|
|
+
|
|
|
+Appendix A: AMBA devices
|
|
|
+------------------------
|
|
|
+
|
|
|
+ARM Primecells are a certain kind of device attached to the ARM AMBA
|
|
|
+bus which include some support for hardware detection and power
|
|
|
+management. In Linux, struct amba_device and the amba_bus_type is
|
|
|
+used to represent Primecell devices. However, the fiddly bit is that
|
|
|
+not all devices on an AMBA bus are Primecells, and for Linux it is
|
|
|
+typical for both amba_device and platform_device instances to be
|
|
|
+siblings of the same bus segment.
|
|
|
+
|
|
|
+When using the DT, this creates problems for of_platform_populate()
|
|
|
+because it must decide whether to register each node as either a
|
|
|
+platform_device or an amba_device. This unfortunately complicates the
|
|
|
+device creation model a little bit, but the solution turns out not to
|
|
|
+be too invasive. If a node is compatible with "arm,amba-primecell", then
|
|
|
+of_platform_populate() will register it as an amba_device instead of a
|
|
|
+platform_device.
|