Kvm waiting for driver initialization

2022.01.16 00:52

For copy operation this number is two VirtIO device header and preallocated buffer for data payload. When in case of copy operation the driver could report the completion of send operation immediately, it is impossible when it uses SG with system-owned physical buffers. The driver takes decision how to transfer this specific packet on per-packet basis, so it uses common completion scheme: the packet reported as completed when VirtIO returns buffers, submitted for this packet.

All the members of adapter context structure related to sending path are protected by Send Lock. See also Synchronization. Each of these fragments may contain more than one part in physical memory. Failed packets are labeled as if they transmit buffer was released. WL is processed on each exit from main TX procedure; finished packets are completed, attached resources send entry freed.

In order to know them, the driver must initiate mapping operation per packet and only in callback procedure it receives SG list of the packet. In the last case, the order on which the callback is called for different packets in not guaranteed, at least this is not documented.

Inside main body of TX operation the driver peeks the packet list at the head of the Send Queue, retrieves from it next packet to send and tries to submit it. If the packet is failed, the packet completion procedure is called immediately.

When the packet is completed by VirtIO or failed , the packet completion procedure labels it as finished and increments number of finished packets in tNetBufferListEntry and frees resources associated with specific tNetBufferEntry. The main body of TX exits its loop when all the packets are submitted or when the next packet is delayed no VirtIO buffers for it. The checksum offload is controlled by configuration and disabled by default.

Implementation of SW emulation is in sw-offload. All the procedures work with data block starting from the beginning of IP header. The exact required modification or verification of checksum is set on parameter bit mask for procedure.

Currently we declare support only for one encapsulation Note that the packet data must be in contiguous buffer in virtual space in order to process the packet data. The driver sets limit of block size that can be submitted to 0xF NDIS during tests uses different sizes of packet and sets the MSS value to different values from bytes; sometimes the host generates splash of more than packets, so the test may fail if some other traffic present on the tap.

In general, selected VLAN to operate shall be provided by operation system on the configuration setting of the driver, the driver needs to populate this value in the outgoing tag. One of requirement to network adapter mentioned in Logo requirements is padding of outgoing packets that have length less than 60 bytes with zeroes up to 60 bytes. The driver must be careful parsing chain of fragments related to each packet provided for transmission. Each packet has declared length of the data; the buffers chain may continue after this valid area, following buffers are not-valid.

The driver must ignore them and never touch. This change affects also TX path; during initialization the driver allocates bigger area for VirtIO header. During initialization the driver creates VirtIO queue which is automatically sized according to size of the queue in the device.

The driver must provide physical buffers to place the received data as well as the structure for tracking buffers descriptors.

Thus, for any packet the driver receives, it needs to have contiguous buffer with known physical and virtual addresses and. The driver uses configuration parameter of initial number of RX buffers to prepare.

Using this number it preallocates these system-dependent objects one per buffer and pool objects for them and uses them when required. Then the driver starts preparing descriptor structures listable control blocks for receiving.

The driver prepares descriptors one-by-one:. For that, during initialization the driver allocates storage for array of pointers to use it during batch indication. It checks the destination address of the packet and decides using current packet filter mask whether the packet shall be indicated up to networks stack or dropped. The size of batch defined during initialization — it can include all the buffers of VirtIO, but during the loop the procedure, in general, may retrieve more buffers than VirtIO contains, as some buffers may be returned by NDIS, put back to VirtIO and returned back to the driver.

The procedure of indication must be called without holding spinlock, as the returning of buffers from NDIS to the driver may happen synchronously or asynchronously. Currently this procedure is not optimal — it removes the tag by moving data inside the buffer; it can be optimized later.

All the variables and objects related to RX path are protected by Receive Lock. Transition from Disabled to Enabled is simple and synchronous; transition from Enabled to Disabled is asynchronous, with possible intermediate Pausing state when the RX path waits to returning of all the packets, indicated to NDIS, and TX path waits for completion of all the packets which sending is in process.

When asynchronous pausing started, sending and receiving of new packets are suppressed and callback procedures set to indicate end of transition and provide indication to NDIS when applicable. Power off and power on procedures also includes pause suspend and resume as steps of execution. Driver recognizes these events, when reads interrupt status and passes it on context variable to DPC procedure. When connect detection bit in interrupt status set bit 1 , the driver in the DPC rereads connect status from configuration array of the device the configuration array contains MAC address and connect status via IO space of PCI device and sends indication to NDIS about change in connection state.

Processing of interrupts and DPC continues; any packet, received in not connected state, returned to VirtIO queue without indication. For both, on each OID call the driver receives exchange buffer and its input and output size; the driver needs to indicate number of bytes it read or written on successful completion; on failure due to too small buffer the driver indicates proper size of buffer required for the operation.

This allows splitting of OID support implementation to system-dependent and system-independent part where entry points are system dependent and the implementation for many OID is common.

In other rare cases the driver completes OID asynchronously using scheduled work items or special indications. Each system-dependent implementation contains table of structures defining all the supported OID operations, each table entry includes. This configuration always contain encapsulation setting one of those the driver supports and placement of IP header in data buffers of packets which will be sent non-IP packets are out of interest, for them offload will never be required.

Using configuration parameters some or all offload task can be masked from reports and left disabled. Initial set of possible kinds of offload tasks capabilities and initial set of enabled offload tasks configuration the driver retrieves from configuration parameters and reports to NDIS during its initialization sequence via NdisMSetMiniportAttributes. Historically, the driver required to support different versions of offload tasks formats V1 and V2 , where V2 includes also IPV6 related fields.

This OID for D3 can not be failed in best case this will abort system transition to low power state ; in general, the driver never shall fail power management OID requests. In general, the state of hardware devices in system hibernation state S4 is very close to power-off. During transition to S0 system state the VirtIO device will be full reinitialized.

During system transition to sleep S3 state some device context may persist. Unfortunately, testing software does not like such a behavior, so the driver must simulate extended support for power management. Some packages aren't link to libvirtd's one like dmidecode, ebtables and dnsmasq. Are dmidecode, ebtables and dnsmasq mandatory? All of them are in optdepends. I thought reason of failure is "failed to parse group kvm" and other errors are not critical and can be ignored.

Tasks related to this task 0 Remove Duplicate tasks of this task 0. Task Type Bug Report. Sergej Pupykin sergej. To test your mapping, try printing out the device status register section This is a 4 byte register that starts at byte 8 of the register space. Hint: You'll need a lot of constants, like the locations of registers and values of bit masks.

Trying to copy these out of the developer's manual is error-prone and mistakes can lead to painful debugging sessions.

We don't recommend copying it in verbatim, because it defines far more than you actually need and may not define things in the way you need, but it's a good starting point. You could imagine transmitting and receiving packets by writing and reading from the E's registers, but this would be slow and would require the E to buffer packet data internally. The driver is responsible for allocating memory for the transmit and receive queues, setting up DMA descriptors, and configuring the E with the location of these queues, but everything after that is asynchronous.

To transmit a packet, the driver copies it into the next DMA descriptor in the transmit queue and informs the E that another packet is available; the E will copy the data out of the descriptor when there is time to send the packet. Likewise, when the E receives a packet, it copies it into the next DMA descriptor in the receive queue, which the driver can read from at its next opportunity.

The receive and transmit queues are very similar at a high level. Both consist of a sequence of descriptors. While the exact structure of these descriptors varies, each descriptor contains some flags and the physical address of a buffer containing packet data either packet data for the card to send, or a buffer allocated by the OS for the card to write a received packet to. The queues are implemented as circular arrays, meaning that when the card or the driver reach the end of the array, it wraps back around to the beginning.

Both have a head pointer and a tail pointer and the contents of the queue are the descriptors between these two pointers. The hardware always consumes descriptors from the head and moves the head pointer, while the driver always add descriptors to the tail and moves the tail pointer. The descriptors in the transmit queue represent packets waiting to be sent hence, in the steady state, the transmit queue is empty.

For the receive queue, the descriptors in the queue are free descriptors that the card can receive packets into hence, in the steady state, the receive queue consists of all available receive descriptors. Correctly updating the tail register without confusing the E is tricky; be careful! The pointers to these arrays as well as the addresses of the packet buffers in the descriptors must all be physical addresses because hardware performs DMA directly to and from physical RAM without going through the MMU.

The transmit and receive functions of the E are basically independent of each other, so we can work on one at a time. We'll attack transmitting packets first simply because we can't test receive without transmitting an "I'm here! First, you'll have to initialize the card to transmit, following the steps described in section The first step of transmit initialization is setting up the transmit queue. The precise structure of the queue is described in section 3. We won't be using the TCP offload features of the E, so you can focus on the "legacy transmit descriptor format.

You'll find it convenient to use C struct s to describe the E's structures. As you've seen with structures like the struct Trapframe , C struct s let you precisely layout data in memory.

C can insert padding between fields, but the E's structures are laid out such that this shouldn't be a problem. If you do encounter field alignment problems, look into GCC's "packed" attribute.

As an example, consider the legacy transmit descriptor given in table of the manual and reproduced here:. The first byte of the structure starts at the top right, so to convert this into a C struct, read from right to left, top to bottom. If you squint at it right, you'll see that all of the fields even fit nicely into a standard-size types:.

Your driver will have to reserve memory for the transmit descriptor array and the packet buffers pointed to by the transmit descriptors. There are several ways to do this, ranging from dynamically allocating pages to simply declaring them in global variables.

Whatever you choose, keep in mind that the E accesses physical memory directly, which means any buffer it accesses must be contiguous in physical memory. There are also multiple ways to handle the packet buffers. The simplest, which we recommend starting with, is to reserve space for a packet buffer for each descriptor during driver initialization and simply copy packet data into and out of these pre-allocated buffers.

The maximum size of an Ethernet packet is bytes, which bounds how big these buffers need to be. More sophisticated drivers could dynamically allocate packet buffers e.

Exercise 5. Perform the initialization steps described in section Use section 13 as a reference for the registers the initialization process refers to and sections 3. Be mindful of the alignment requirements on the transmit descriptor array and the restrictions on length of this array. Since TDLEN must be byte aligned and each transmit descriptor is 16 bytes, your transmit descriptor array will need some multiple of 8 transmit descriptors.

However, don't use more than 64 descriptors or our tests won't be able to test transmit ring overflow. For the TCTL. COLD, you can assume full-duplex operation. For TIPG, refer to the default values described in table of section If you are using the course qemu, you should see an "e tx disabled" message when you set the TDT register since this happens before you set TCTL.

EN and no further "e" messages. Now that transmit is initialized, you'll have to write the code to transmit a packet and make it accessible to user space via a system call. To transmit a packet, you have to add it to the tail of the transmit queue, which means copying the packet data into the next packet buffer and then updating the TDT transmit descriptor tail register to inform the card that there's another packet in the transmit queue.

Note that TDT is an index into the transmit descriptor array, not a byte offset; the documentation isn't very clear about this. However, the transmit queue is only so big. What happens if the card has fallen behind transmitting packets and the transmit queue is full?

In order to detect this condition, you'll need some feedback from the E Unfortunately, you can't just use the TDH transmit descriptor head register; the documentation explicitly states that reading this register from software is unreliable.

However, if you set the RS bit in the command field of a transmit descriptor, then, when the card has transmitted the packet in that descriptor, the card will set the DD bit in the status field of the descriptor. If a descriptor's DD bit is set, you know it's safe to recycle that descriptor and use it to transmit another packet.

What if the user calls your transmit system call, but the DD bit of the next descriptor isn't set, indicating that the transmit queue is full? You'll have to decide what to do in this situation. You could simply drop the packet. Network protocols are resilient to this, but if you drop a large burst of packets, the protocol may not recover.

This has the advantage of pushing back on the environment generating the data. Exercise 6. Write a function to transmit a packet by checking that the next descriptor is free, copying the packet data into the next descriptor, and updating TDT. Make sure you handle the transmit queue being full.

Now would be a good time to test your packet transmit code. Try transmitting just a few packets by directly calling your transmit function from the kernel. You don't have to create packets that conform to any particular network protocol in order to test this. You should see something like. If you get lots of "e tx disabled" messages, then you didn't set the transmit control register right. If you saw the expected "e index" messages from QEMU, but your packet capture is empty, double check that you filled in every necessary field and bit in your transmit descriptors the E probably went through your transmit descriptors, but didn't think it had to send anything.

Exercise 7. Add a system call that lets you transmit packets from user space. The exact interface is up to you. Don't forget to check any pointers passed to the kernel from user space. Now that you have a system call interface to the transmit side of your device driver, it's time to send packets. All subsequent bytes on the IPC page are dedicated to the packet contents. Be aware of the interaction between the device driver, the output environment and the core network server when there is no more space in the device driver's transmit queue.

The core network server sends packets to the output environment using IPC. If the output environment is suspended due to a send packet system call because the driver has no more buffer space for new packets, the core network server will block waiting for the output server to accept the IPC call.

If this overflows your transmit ring, double check that you're handling the DD status bit correctly and that you've told the hardware to set the DD status bit using the RS command bit.

Just like you did for transmitting packets, you'll have to configure the E to receive packets and provide a receive descriptor queue and receive descriptors. Section 3. Exercise 9. Read section 3. You can ignore anything about interrupts and checksum offloading you can return to these sections if you decide to use these features later , and you don't have to be concerned with the details of thresholds and how the card's internal caches work.

The receive queue is very similar to the transmit queue, except that it consists of empty packet buffers waiting to be filled with incoming packets. Hence, when the network is idle, the transmit queue is empty because all packets have been sent , but the receive queue is full of empty packet buffers. When the E receives a packet, it first checks if it matches the card's configured filters for example, to see if the packet is addressed to this E's MAC address and ignores the packet if it doesn't match any filters.

Otherwise, the E tries to retrieve the next receive descriptor from the head of the receive queue.

forrighricwhi1973's Ownd

0コメント

1000 / 1000