January 10th, 2024
Unified Namespaces (UNS) are an accelerating trend across the manufacturing & industrial space. Designed to serve as a data highway, their purpose is to help manufacturers become more agile & data-driven: being able to adjust & upgrade their industrial equipment, hardware & software at speed with reduced cost, whilst creating the foundation for a unified data platform that reflects all operations.
While reasonably well-defined as a concept in manufacturing IT, there has been far less focus on how a UNS can be implemented in an industrial environment. In this article, we dive deep into this exact issue, touching upon the different axes of decisions that solution architects, IT managers & manufacturing organizations as a whole have to consider as they embark on their UNS journeys.
We previously did a deep-dive into what a UNS is in our last article here. As a quick recap, a UNS serves as the main communication centre for all other industrial applications, software and hardware to interact with. These can be PLCs (programmable logic controllers) that interact with machines, SCADA, MES, ERPs - any system that impacts the operations of a factory. They can publish their own data to the UNS for other systems to utilize as well as subscribe to data from the UNS directly.
The idea is that each industrial system only needs to communicate with the UNS to be able to receive and send all the information it needs to perform its function. The alternative - and the status quo today - is for an industrial system to support direct integrations with multiple other services whether hardware or software. This status quo approach leads to ‘integration hell’ where manufacturers have to spend significant financial & time resources to set up, manage and often fix, issues with multiple systems that don’t often work well with one another.
The benefits of implementing a UNS within manufacturing are vast. Manufacturers can benefit from faster decision making, great operational visibility, and reduced overheads & technical debt. The result is higher production, lower downtime and greater efficiency.
While a UNS sounds ‘simple’ in theory, in practice there is a lot of nuance behind the implementation. Incorporating elements of a UNS is a multi-stakeholder challenge touching teams across production, automation, process, IT and more. Each team often has their own criteria, requirements & concerns which need to be accommodated when designing a solution.
To help with this solution architecting decision, we frame the implementation of a UNS along various ‘axes’ that manufacturers need to consider. There is no right or wrong answer on any of these (we certainly do have our view though!), nor are they binary choices - but they should all be considered in the context of a broader UNS implementation.
A common assumption is that a UNS is a single services for which all other industrial systems communicate with. Often this takes place where the event broker is deployed onto a single server in the factory floor. The reason being is that many OT industrial services (PLCs, sensors etc) should not, by design, have direct access to the Internet to be protected from malicious actors.
The challenge with this approach is that it introduces a single point of failure into the OT environment. If for whatever reason the event broker server breaks down (i.e. overloaded), the entire factory operation can be impeded depending on which services are reliant on the UNS. This may be mission-critical if the purpose of the UNS touches upon feedback to operator applications, KPI monitoring or closed loop workflows where automation equipment takes actions based on events processed by the UNS.
This risk can be mitigated to some degree with high availability infrastructure where there are fallback servers in case there is a fault with the primary UNS server. Whilst effective, this does require appropriate IT expertise to manage, monitor & set-up: there is an overhead cost.
A different approach is to distribute the workload of a UNS across multiple servers & devices, where redundancy can be baked in where it matters most. For example, you may have an MES (i.e. manufacturing execution system) providing instructions to a set of PLCs that in turn control machinery; and separately you may be collecting data from some sensors on the production line to run some analysis (say energy tracking). Instead of combining both the writeback of instructions from the MES to PLCs with the data extraction from the sensors one one server serving as the UNS, you can run the workloads on two different devices/servers. The MES-to-PLC communication could take place on a high-availability server, and the sensor data gathering could be accomplished with an edge gateway device. If for whatever reason the latter fails or becomes overwhelmed, the core MES-to-PLC workflow will remain intact without being impacted.
The advantage of such an approach is that a manufacturer can gently introduce more event coverage & data flows into a UNS over time without necessarily having to build out a high availability centralised infrastructure from day one. The resources of IT teams can therefore be focused on making sure mission-critical operations have high availability and are separated from lower-priority UNS events.
Ferry’s take: We are big proponents of decentralised UNS infrastructures as we believe it brings the best of both worlds. With the right tooling (especially around managing deployments and devices), IT teams can better allocate their time to ensuring the most important services stay online whilst mitigating risk from other services that could overwhelm a centralized UNS infrastructure. In terms of hardware, select software solutions that are hardware agnostic and ensure that you are not forced into plug-and-play solutions which lock you to a particular vendor.
Much of the literature on UNS suggests using MQTT as the main communication protocol within a UNS. There are many reasons for this: it is a lightweight efficient protocol supporting asynchronous publish/subscribe communications, and it is scalable & flexible.
However, it is not the only protocol in use. The industrial protocol space is highly fragmented in part due to hardware providers creating their own protocols / methods as well as more recent open-source efforts. There’s EtherNet/IP, Profinet, OPC-UA, BacNet, Modbus TCP, HTTPS, Kafka and many more!
Usually industrial hardware and software will primarily support integrations with a subset of these protocols which may not include MQTT. The workaround for this has been to purchase additional software (i.e. Kepware, Litmus) to help with this IIoT connectivity. If the UNS only supports MQTT as the main communication protocol, then every publish and subscribe event will require a translation layer for the service that is connected to the UNS if it does not support MQTT natively out of the box.
Ferry’s take: We believe that a UNS should natively embed multiple communication protocols within its function which abstract away the issue of protocol translation. Individual industrial services ideally shouldn’t have to be paired with a separate service to be able to communicate with the UNS (for the most common protocols supported which will cover the vast majority of cases).
Most explanations of a UNS refer to a UNS being simply an event broker: a highway of data that can flow between multiple services. What is not discussed is what data those events actually represent.
Let us frame this in the context of an example. Say we had a production line with a machine that packages candy. The machine has a PLC attached to it that emits an event for each piece of candy packaged, and a state for whether the machine is on or off.
With a purely event-driven UNS architecture, the PLC may emit to the UNS both types of data (i.e. the count and the on/off state) at frequent intervals. Any service that subscribes to the UNS can then receive both these pieces of data. But often data by itself without some additional computation, or cross-referencing with other data items, has limited informative value.
For example, you can calculate the cumulative number of candy produced by tracking the count of package events over time. But this requires the application doing this calculation to track the state of production (i.e. the total count of candy packaged). With a purely event-driven UNS this computation would be the responsibility of the services receiving event data from the UNS. Equally, if you wanted to combine on/off data with a cumulative count to deduce batch cycles, or begin to calculate efficiency metrics, the same argument applies.
The challenges with delegating compute logic to the individual services connected to the UNS (which is a pure event broker) are many:
Duplicated effort: Multiple services need to individually compute the same metrics / analysis. This not only leads to repeated effort, but more importantly, differing services (i.e. an ERP v.s. an MES) will require completely different adjustments to their configuration to be able to support such calculations. This can be a time-sink for automation & IT teams.
Maintenance & testing overhead: A symptom of the above, IT & automation teams will need to allocate significant time to test, debug and monitor each service separately to ensure that any computations and calculations are handled appropriately.
Data quality & validation: Event data can often by corrupted or be transferred in an incorrect format (i.e. string instead of an integer). This in turn can generate errors in services that consume data from a UNS. Some more advanced MQTT brokers allow for real-time data validation (see HiveMQ’s Data Hub) but pure MQTT broker implementations may not support this out of the box.
Compute constraints: Very often services subscribing to a UNS will simply not have the capability, bandwidth or compute power to be able to carry out their computation. PLCs for example, being control systems, should not carry out additional compute lest it interferes with their control cycle for the operation they are monitoring. SCADA and MES systems might not be able to be configured to run specific analysis or calculate certain metrics.
No central metric store: There is no organizational overview of all the metrics, data items and features that the industrial operations of the factory are generating. This slows down cross-team collaboration, and makes it harder for both production & IT teams to identify the source-of-truth of data. This isn’t the same as storage of data! We’ll come to that in the next section.
A different approach is to embed computation and analytics into the UNS itself. In this framework, a UNS not only collects events from other services, but also generates metrics, KPIs and analysis that other services in turn can consume. This greatly simplifies ongoing maintenance as well as implementation, and provides a single information source over what data is being collected, analysed and transmitted across all services that operate within the industrial environment.
Ferry’s take: A UNS should be far more than an event-broker. The ideal UNS should be able to consume data from other services, compute any analytics or inference required by the organization, and then provide that information to other services who require it. Computational capability mixed with a centralised metrics repository should be core tenets of a UNS.
The earliest proponents of suggested that a UNS should be the source of truth for a manufacturer as well as be a real-time event broker. The idea is that any service or person could always go to the UNS to be able to find any form of data they need to be able to fulfil their requirements.
To be a source of truth, you need to be able to query data, and that means having a service that is capable of data storage. The reality is that the amount of data that manufacturers produce is simply staggering. A recent AWS investigation a couple of years ago found that manufacturers generate more than 1800 petabytes of data per year, twice as much as the next closest industry.
Data storage is not a manufacturer’s core competency. To be able to efficiently store, scale and manage such vast volumes of data requires the right expertise and vendors to be able to provide that capability - and given the scale of data generation, this likely requires a cloud-based provider (i.e. AWS, Azure, Databricks, Snowflake etc). Sources of truth for this data usually are provided by data lakes / data warehouses / databases - systems uniquely designed for large-scale data storage and efficient compute.
This fundamentally is not the purpose of what a UNS is designed to be: a data highway that provides real-time information to all relevant services. A UNS can absolutely write to and read from a data warehouse / lake for additional information if it requires, but for appropriate separation of functionality, storage should be managed by services that are designed for precisely that.
Ferry’s take: A UNS should only be considered with real-time data. The UNS can interact with stores of data for appropriate querying or writing of that data, but it should not serve as a data storage mechanism by itself. If the UNS additionally serves as a source of truth around what data is being collected & analyzed, it need not be the storage of data itself. In fact, a UNS can maintain information about where data is being stored across the organization to provide visibility over data residency.
It is very common for manufacturing operations to be separated from the Internet with a firewall, or are air-gapped. The reason for this is primarily security - to ensure that factory equipment is protected from potential malicious actors who, if able to damage machinery remotely, can significantly generate downtime.
The challenge that many manufacturers face is how to balance this risk with the benefits of centralizing data in cloud IT systems for other teams in the organization to make use of (i.e. sales, data, finance etc). Depending on the services that are required to communicate with a UNS (i.e. a cloud-based ERP, or a data warehouse), this may mean that the UNS itself needs to have some ability to communicate with the Internet.
How to accommodate for this? There are a variety of architectures, but the key premise is that any UNS that communicates with an external network should rely on outbound communications only at the very minimum. This means that no inbound connections or requests from outside the factory network will be permitted by the UNS, but it can proactively by itself initiate communications with select external services to request or pass on data.
Some examples of this are AWS Greengrass (i.e. the device runtime) linking to AWS IoT Core (the cloud service); as well as Azure IoT Edge communicating with Azure’s IoT Hub. For both AWS Greengrass and Azure IoT Edge, they allow the UNS (or any edge device) to have secure outbound-only communications with their related cloud IoT service, which they use for receiving and transferring data.
In general, security is a whole topic of its own when it comes to UNS implementation, and we’ll dive deeper into the varying architectures manufacturers can use in a future article!
Ferry’s take: If the UNS requires cloud connectivity, you can separate the infrastructure into (at least) two parts. The first can reside in the locked-down OT network with no external Internet access that can connect to all factory operations. It can then have a one-way outbound connection to the second part of the UNS infrastructure which is a classic edge device/server. This part can be Internet connected using an appropriate edge runtime which has outbound-only connections to the relevant cloud provider or external services which you are using.
Historic implementations of industrial software - namely ERP, MES and SCADA systems - have often required long implementation cycles with extensive configuration required to attempt to map the software’s capability to the manufacturer’s operations. This can often take months, and we have seen up to 2 years in some cases.
As with all digital transformation projects, ROI is key. Our advice is always to start with a particular part of the manufacturing process where a lack of data visibility or analysis is impeding production best practices. Then implementing the relevant components of a UNS if appropriate to solve just that part of the puzzle is the best way to go.
Over time, you can then integrate more services in with your UNS, and expand the scope of the UNS to cover more data flow in your industrial operations. This can follow where ROI dictates. The aim is not to have a UNS cover all data events on the shop floor; more for it to cover the most important parts that the manufacturer needs to have full visibility and clarity over.