One day we received an SOS call from Erik*, who works at a company that develops communication systems.

Erik’s team was testing their latest prototype hardware, including an NXP QorIQ processor and DDR4 memory. They experienced some timing issues and were not able to get their hardware up and running.

This being their first DDR4 design, they were not sure how to proceed with their hardware bring-up. They therefore started looking for advice from someone who had successfully done DDR4 troubleshooting before.

Designing in DDR4 for the first time

Erik’s teams built their first design using DDR4 memory. They were careful to keep the DDR4 SDRAM implementation details close to reference designs. To avoid potential issues, they closely followed NXP application notes.

PCB design was done using PCB consultants, who did everything they could to keep crosstalk under control using the Altium layout tools. To compensate for missing simulation tools they chose a 12 layer PCB stack-up and paid close attention to line width, material parameters, line impedance, and termination resistors.

Before Erik called us, his team had already done quite an extensive investigation. They had already identified some design flaws. They had implemented work-arounds that allowed them to boot their boards and run a few memory tests. The main issue at the time of calling was a failing clocks centering / adjustment test.

Finding the root cause

After signing the NDA we first and foremost focused on the signal integrity analysis for the DDR4. Based on signal integrity analysis and simulation results, we essentially concluded the same as Erik’s team had found in the prototype: the address bus and the clock would most likely not work.

The nice thing about signal integrity analysis is that it does not just confirm that the problem is likely to occur. It also tells you where it comes from, purely based on the simulations.

This eye diagram for a typical case at a speed of 800 Mb/s show us a ringback crossing the threshold. That causes a failure in the address signal: it reduces the timing window and makes it impossible to access the DDR4 memory at the desired 800 Mb/s.

What to do next time?

Simulations showed that use of conventional vias causes the ringback. We advised Erik’s team to replace those with micro vias and buried vias to minimize reflections from the vias. A redesign of the board is the only way of completely getting rid of the issues. Based on superior simulated performance, we advised what PCB stack-up to use for the next prototype.

Using the prototype against the odds

Did we give up on the prototype after concluding this? Far from it. Our software engineers made a board support package (BSP) that allowed full use of the board, with the concession of accessing the DDR4 memory at the lowest speed. Though the board does not perform as planned, it can be used for general hardware and software test and verification. Erik’s team had their first application up and running within a day after we delivered the BSP.

Future steps

Erik had already accepted the need for a redesign based on their initial investigations and the issues they found there. Our analysis helped to identify more issues than they could have identified themselves in the existing prototype. Although another prototype is a though bullet to bite, Erik is happy knowing that the next prototype fixes the issues they found, and the issues we found for them. They gave us the assignment to do the next PCB layout. They trusted us to design electronics that DO work.

*For reasons of discretion, we do not use our customer’s real name.