Trouble-shooting by design for control systems networks
Networked instrumentation, automation and control systems now form the backbone of factory automation
and process control. The wider use of networked systems has brought a need for better education and training
in the network engineering aspects - indeed the physics - of system design, installation and maintenance.
Network systems trainer and consultant Andy Verwer provides a hands-on guide to diagnosing common, but
not necessarily obvious, problems in networked automation architecture and projects. If it is your job to keep
the factory network running, then you have to understand both the electrical and logical design issues.
IT REALLY GOES without saying that maintenance
staff require the correct tools for
health-checking and troubleshooting networked
systems; a multimeter is no longer sufficient
for fault-finding. The maintenance tool kit
would typically include gear for packet analysis
and waveform visualisation together with the
knowledge about its use... not so much the
connection and operation of these tools, but
rather the interpretation of their results so as
to develop a systematic diagnostic approach.
With it, it becomes possible to diagnose
network problems quickly, and locate the faults
that will inevitably occur during the lifetime
of the plant.
Unfortunately, the management within some
companies really does not understand the
argument for investment in knowledge-based
diagnostics. One example involves a UK
company that was recently visited. The
engineers wanted to get up to speed with their
fault-finding and trouble-shooting skills on
Profibus networks. Five engineers brought
along their work laptops, the first task being
to install some engineering software on their
machines. Unfortunately the software would
not install: the machines were all old, had outof-
date service packs and very little memory
or disk space. It then transpired that the
company in question refused to purchase
laptops for their maintenance people. The only
machines that were available were hand-medowns
from office staff higher up the pecking
order for new laptops.
This story is not unique. Maintenance staff
are often undervalued by management, particularly
in the UK where I am based. Across
Europe, the USA and elsewhere, the engineering
profession seems to enjoy higher status.
Maintenance and fault finding on networked
systems can be difficult since an engineering
problem at one network location can - and
frequently does - manifest itself at a different,
physically remote location. Separating out the
cause requires a systematic and logical
approach, a high level of skill and technical
training and certainly the use of modern
equipment and tools. It is false economy to
skimp on support for service personnel.
Many tools that are available for fault finding
on networked systems appear quite easy to use,
and many can even produce an automated
report with conclusions and recommendations.
However, network fault-finding doesn¡¯t readily
lend itself to button-push, one-stop diagnostics
and such systems rarely get to the root of
a problem. This is not to say that they aren¡¯t
of use: they can reliably confirm the presence
or absence of errors, and that things are
operating within specification. Unfortunately
intermittent faults that come and go may not
show up when carrying out a simple health
check. When these do occur, the best available
tool is the one between the ears of the maintenance
engineer. Given appropriate tools,
training and a systematic approach, they can
often diagnose and locate even really difficult
The fieldbus and networking revolution has
made some aspects of fault-finding more
difficult. Communication faults on networks
are notoriously difficult to diagnose and locate,
particularly when the fault is intermittent. One
associated difficulty is that the device (or more
usually the connector or wiring) that is causing
errors in the network is not necessarily the
device where the fault symptoms exhibit
Reflections from the digital signals traversing
the network media are a good example of this
type of problem. Since any appreciable length
(>1m) of network cable acts as a transmission
line, any high speed digital network is prone to
reflection issues. It may have substantial
effects even on a 12Mbps Profibus (RS485)
network sector, a real possibility given the
dropline topology of this fieldbus. If Fast
Ethernet were the subject under discussion,
then the critical distances would be eight times
less. Thus a faulty Ethernet patchcord - or its
hidden termination in disfunctional equipment
- could be enough to produce bizarre network
Back where it came from: A laboratory oscilloscope
screen grab showing what happens when a single pulse is
sent up an unterminated network cable stub attached to a
simulated network segment. The original single pulse has
now become two separate pulses which, in a real case,
would result in network data corruption - Frank Ogden
A working knowledge of transmission line
theory tells you much of what you need to
know about network trouble-shooting. The
network cable has a characteristic impedance
which depends on the cable capacitance and
inductance, i.e., it depends on the cable
construction. As the network signal passes
along the cable, it must charge up this capacitance
and push through the inductance to get
from one end to the other. The characteristic
impedance is measured in Ohms, but we should
not get confused between the cable resistance
and the characteristic impedance. For example
Ethernet cable has a characteristic impedance
of 100¦¸, but its DC electrical resistance is
typically less than 0.188Ω/m. Profibus DP
cable has a characteristic impedance of 150¦¸
and a resistance of less than 0.11Ω/m.
Reflections can be caused by any change in
impedance along the cable. For copper cables,
discontinuities can be a change in capacitance
or inductance of the cable, caused by tightly
bent cable or a connector with excessive capacitance,
etc. The discontinuity can cause the
transmitted signal to bounce back along the
cable like an echo resulting in repeated or
distorted signals. The devices that are most
affected by the reflection are often those that
are furthest from the cause. This is because
the delay is greater the further the reflection
has travelled. The longer the delay, the more
chance of corrupting the next bit that is
travelling down the cable.
The largest discontinuity on a cable is
normally at the end of the cable, where the
impedance suddenly increases to infinity. The
end of the cable is thus like a brick wall and the
signal will reflect from the end back down the
way it came giving problems. To avoid these
reflections from the end of the cable we use a
termination resistor, or more usually, a termination
resistor network with an impedance that
matches the cable characteristic impedance
built in to the equipment. For example, every
Ethernet connector socket on every device
incorporates a resistor that matches the cable.
Because Ethernet cables always connect from
one device to another (i.e., switch, router, PLC,
etc.) the termination should always be there
when plugged in. However a broken wire, short
circuit or corroded connector can cause reflections.
Many fieldbuses like Profibus use
multi-drop cabling where a cable can connect
many devices together. Here the termination
is a little trickier. The installer must switch on
the termination resistors at the ends of the
cable but not at devices in the middle.
Reflections may be caused by incorrect termination,
poor connections, water ingress,
damaged or sharply bent cable. Reflections can
also occur on fibre optic transmission, again
caused by bent or damaged fibres. The location
of the fault can be misleading, and engineers
will often replace the wrong devices while
chasing the fault.
A system with problem reflections may be
diagnosed with waveform visualisation tools
such as an oscilloscope or time domain reflectometer.
Measurements can then pinpoint the
cause of the reflection on the cable. But this
exercise is not trivial. Some training is required
on how to make the measurements and
interpret the results.
Power supply problems
Power supplies can be a source of errors that
are often initially blamed on the communication
network. A surprisingly common cause of communication problems can arise from
power supply failure. For example a loose screw
terminal on a 24V power supply can cause
intermittent device failure whenever vibration
occurs. The symptom might be that several
devices drop off the network at intervals.
People often tend to blame the network for
such a failure.
A thing of beauty does not necessarily make for good engineering: A problem frequently seen on industrial installations
concerns the use of beautifully coiled earth wires. Such earth wiring actually introduces a significant inductance into the
earthing cable which is bad news for interference: inductance produces problematic impedance at high frequencies.
Another frequently seen problem involves
power supply overload. In this case, the story
starts with the system designer!
A cabinet or panel on the system might incorporate
several devices that require 24V. The
system designer quite correctly sums up the
current requirements for these devices and
selects an appropriately sized power supply.
For example if the current required is, say, 4A
then probably a 5A power supply will do the
job. The power supply is installed and the
system commissioned and checked out as
But then again, ugly isn't always good: Cable separation is equally important and numerous guidelines and IEC
standards provide excellent rules and information on how to layout and segregate cables to avoid EM coupling.
A few months later someone decides that an
extra bit of kit is needed in the cabinet,
perhaps a switch, modem or some other
additional piece of electronics. The new device
requires a 24V supply at a couple of hundred
mA. Where can we easily get this from? Ah,
the system already has a 24V supply with spare
capacity - problem solved.
Unfortunately, a while later, we start getting
occasional or intermittent network faults. What
is actually happening is that the power supply
is now working too close to its current limit.
When two or more digital outputs switch on
simultaneously, particularly if the loads are
inductive, the inrush current can take the
device load current over the power supply limit.
The result is that the 24V collapses and devices
fall off the network.
Of course, on most modern systems, the
network devices will fail-safe, that is the
outputs would normally switch off automatically.
The load current therefore reduces,
allowing the power supply to recover and
devices to reappear on the network. This
scenario looks just like a network failure, but
is actually caused by power supply failure.
The maintenance technician is well advised to
look for common factors when several devices
intermittently fail. Are they in the same
cabinet? Are they on the same power supply?
Are they on the same segment? And so on. In
addition, it is really worthwhile specifying
power supplies that have significant excess
capability, perhaps even 100% over current
Power supplies, devices and control cabinets
all require proper earthing. The earthing (aka
grounding) is there not only to provide
protection, but also to help avoid interference
problems. A cable shield when properly earthed
can help to reduce electrostatic pickup.
Unfortunately there is a lot of incorrect information
around about earthing. Some old
unbalanced systems can give earth loop
problems when the cable screen (which is effectively
also the signal reference) is earthed at
both ends. However, modern balanced transmission
systems like Ethernet and Profibus
require the screen to be earthed at every
A problem frequently seen on industrial installations
concerns the use of beautifully coiled
earth wires. Such earth wiring actually introduces
a significant inductance into the earthing cable
which is bad news for interference: inductance
has a high impedance at high frequencies. Thus
the high frequency interference that we are
trying to get rid of cannot flow to earth.
Earthing cables should never be coiled.
Cable separation is equally important and
numerous guidelines and IEC standards provide
excellent rules and information on how to
layout and segregate cables.
Many of the mistakes that are made in laying
out networked automation systems can be
traced to basic design stage decisions. Further,
designers rarely receive feedback from operators
and maintenance staff as to how the system
performs. Errors that are made on one project
are often carried over to others for this reason.
Almost unbelievably, practical experience shows
that designers are often the least well-trained
among the ranks of engineering staff involved
with automation and control systems.
Certified installer training
This training is widely accepted as the minimum standard of training for
anyone who is working in Profibus or Profinet systems at a technical level.
One-day certified Installer courses are widely available and offer a cost
effective route to avoiding costly errors in layout and installation. The course
teaches the basic principles of the technology and covers the basic layout,
installation and testing of the network physical layer. Surprisingly, this course
is not just for installers; it also provides essential basic training for system
designers, maintenance and all engineering staff involved at a technical
level. Additional days can be added to extend the basic training for
maintenance, design and engineering staff.
When a new automation project is started, there are key design
decisions that must be made at the concept stage, generally based upon:
• System cost;
• System dependability;
• System performance.
Cost is often seen as the procurement cost, that is the cost to design,
purchase, install and commission the system. However, the total costs
should really be based on the whole lifecycle of the plant, not just
procurement. Lifecycle costs include those of maintenance, fault-finding,
loss of production during down time, etc.
Dependability is the availability of the system to deliver the required
services, i.e., up-time. Availability depends critically on reliability, but
equally important, it also relates to ease of fault diagnosis, location
and time to repair. All parts of a complex system can fail.
Redundant systems that can continue to operate in the event of a
failure can give high availability, but only when combined with good
diagnosis and rapid repair. Dual-redundant systems are no longer
redundant when a failure has occurred in one channel. We need to rapidly
diagnose, locate and repair the fault in order to maintain availability.
Also of course, single point failure in any part of the system (common
to both channels) can take down the whole system. The design of
properly redundant systems with minimum exposure to common cause
failures is complex and requires considerable planning and thought.
Systems that provide rapid diagnosis of faults and which allow fast
repair will have high availability. This can be achieved by informed
system design with built-in monitoring, health-checking and fault
location facilities. Ideally, these will include error reporting and notification
so that operators and engineering maintenance staff are
positively aware of failures and performance degradation.
Automatic reporting and logging of system and device diagnostics is
available on a wide range of devices and systems, but is often underused
or even disabled. Engineers should be aware of these features,
specify their inclusion and provide reporting and logging facilities. It
is perhaps not necessary to report the details of every fault; simply
knowing that a potential problem has developed or performance has
degraded is usually enough. The engineer can then explore the problem
and get details using appropriate tools. However, it is important to get
some sort of message on a screen, or perhaps generate an email to some
responsible person who will act upon it.
The cost of putting full diagnostic reporting into a SCADA system can
be significant, but the cost or simply putting a general message on a
screen that there might be a problem or degradation in performance is
Modern intelligent devices that communicate by fieldbus or Industrial
Ethernet normally have extensive diagnostics that can report device
specific peripheral errors and communication problems. Profibus and
Profinet, in particular, have very well defined and standardised diagnostics
which can clearly show communication and peripheral errors.
In addition, standardised Identification and Maintenance functions
provide an easy-to-use system for reporting the health status of
Profibus/net devices. Standardised, manufacturer independent diagnostics
and status reporting are in many ways the ¡°Jewel in the Crown¡± of
The role of diagnostics
Missing out diagnostic reporting from SCADA systems in order to reduce
the procurement costs really is a false economy in terms of whole life
cycle costs and the downtime. So why do automation projects sometimes
Even when the layout and installation of the network adheres to the
published specifications and guidelines, maintenance personnel can still
encounter problems when dealing with faults, replacing devices,
extending or altering the network. There are well-documented specifications,
guidelines and rules. These need to be understood by system
designers, installers and commissioning engineers. It is just as important
that the reasons for these rules are understood. This lessens the risk of
people breaking or bypassing the rules.
Trained for the job: Actuator Sensor/Interface training at
Unilever, Port Sunlight
Network monitor assistance
A number of new monitoring devices have been
introduced over the last year or so. These can
provide 24/7 network monitoring and reporting
on one or more networks. An example is the
COMbricks unit introduced by Procentec.
COMbricks is a modular repeater and gateway
that can be used on Profibus and Profinet
systems. Up to four independent Profibus
networks can be monitored so that any errors
or degradation in performance can be reported
on SCADA screens using OPC server functionality
or by email via a SMTP. Such devices are
revolutionising the way that we design and
maintain networked automation systems.
System Design training
The first step to a successful project is training.
Profibus International has developed high
quality accredited training for installers, system
designers, commissioning engineers and maintenance
staff. Installer, commissioning &
maintenance training is well established.
Further, many industry sectors specify that
their staff, contractors and sub-contractors
must be appropriately trained.
A single day of training can teach how to
commission, health check and troubleshoot
Profibus/net systems. The course offers a
systematic approach to fault finding in a
practical and hands-on environment. An
additional half day of training can also be
carried out on-plant using the training
equipment. This is really valuable exercise
which builds up the confidence of the trainees
and often identifies faults on the plant that
were previously unknown.
A new two day Certified System Designer
course has been developed this year by
Profibus International. The course provides a
top-down approach to designing a modern
automation and control system and helps
managers and designers to make the correct
decisions from the project outset. The course
is applicable to all sectors of industry from
factory automation to process control.
Andy Verwer is director of Verwer Training & Consultancy
Ltd, technical officer for the UK Profibus Group and a
leading member of the PI working groups for training,
installation and design.