Chapter 0: Welcome & How to Use This Guide
Welcome to the Data Plane Microservice onboarding guide. This document will take you from zero knowledge of this codebase to being a productive contributor.
What is the Data Plane?
The data-plane (also called eric-pc-up-data-plane) is the packet-forwarding engine inside Ericsson's Packet Core Gateway (PCG) product. It processes millions of packets per second β GTP tunneling, NAT, DPI, QoS enforcement, service chaining β all in a cloud-native Kubernetes pod using DPDK for high-speed I/O.
How to Navigate
- Use the sidebar on the left to jump between chapters
- The progress bar at the top shows how far you've read
- Quizzes (marked with π) appear every few chapters β test yourself!
Code blocksshow real patterns from the codebase- Colored boxes: βΉοΈ Info, β οΈ Warning, π‘ Tip
Repository Structure at a Glance
data-plane/
βββ main/ # Main application (dp.c, main.c, config parsers)
βββ pktio/ # Packet I/O subsystem (DPDK, AF_XDP, TAP backends)
βββ vrf/ # Virtual Routing & Forwarding, GRE, FIB
βββ sc/ # Service Chaining engine
βββ cgnat/ # Carrier-Grade NAT
βββ ipfix/ # IPFIX flow export (NAT logging)
βββ upf/ # UPF (User Plane Function) session handling
βββ rib-client/ # RIB (Routing Information Base) client
βββ punt/ # Punt path (slow-path packets to controller)
βββ contest/ # Container integration tests
βββ up-common/ # Shared libraries (EVL, mbox, RCU, timers, etc.)
βββ cdpi-main/ # DPI (Deep Packet Inspection) integration
βββ scripts/ # CI helper scripts
βββ dpi-packages/ # DPI heuristics packages
βββ Makefile # Developer shortcuts
βββ CMakeLists.txt # Top-level CMake (in main/)
βββ Jenkinsfile* # CI/CD pipeline definitions
βββ ruleset2.0.yaml # Bob build rules
βββ common-properties.yaml # Docker images, repos, versions
Chapter 1: 3GPP User Plane Fundamentals
The Split Architecture
In modern 3GPP networks (4G EPC and 5G Core), the control plane and user plane are separated (CUPS β Control and User Plane Separation):
Key Concepts You Already Know (from AMF)
- AMF handles NAS signaling, registration, mobility β you know this well
- SMF manages sessions and tells the UPF what to do via PFCP
- UPF (our data-plane) does the actual packet forwarding
What the UPF Does
| Function | Description |
|---|---|
| GTP-U tunneling | Encap/decap GTP tunnels on N3 (from gNB) and N9 (between UPFs) |
| Packet Detection | Match packets to PDRs (Packet Detection Rules) from SMF |
| Forwarding | Apply FARs (Forwarding Action Rules) β forward, drop, buffer |
| QoS Enforcement | Apply QERs (QoS Enforcement Rules) β rate limiting, marking |
| Usage Reporting | Apply URRs (Usage Reporting Rules) β volume/time measurement |
| NAT/CGNAT | Carrier-Grade NAT for IPv4 address sharing |
| DPI | Deep Packet Inspection for traffic classification |
| Service Chaining | Steer traffic through service functions (firewall, etc.) |
4G vs 5G Terminology
| 4G (EPC) | 5G (5GC) | Our Code |
|---|---|---|
| SGW-U | UPF (I-UPF) | data-plane |
| PGW-U | UPF (PSA) | data-plane |
| S1-U, S5/S8 | N3, N9 | GRE/GTP tunnels in pktio |
| APN | DNN | Network Instance |
| Bearer | QoS Flow | PDR/FAR/QER |
Chapter 2: PFCP Protocol & Session Management
What is PFCP?
PFCP (Packet Forwarding Control Protocol, 3GPP TS 29.244) is the protocol between the SMF (control plane) and UPF (user plane). Think of it as "the boss telling the worker what to do with packets."
PFCP Session Lifecycle
The Rules Model (PDR β FAR/QER/URR)
Each PFCP session contains rules that tell the UPF how to handle packets:
- PDR (Packet Detection Rule) β "Match packets with these criteria" (source interface, GTP TEID, IP filter, etc.)
- FAR (Forwarding Action Rule) β "Do this with matched packets" (forward, drop, buffer, encapsulate in GTP)
- QER (QoS Enforcement Rule) β "Apply this rate limit / QoS marking"
- URR (Usage Reporting Rule) β "Count bytes/packets and report when threshold hit"
upf/ directory. The session data structures and PDR matching logic are in the service chaining engine (sc/). Configuration from PFCP arrives via the CM Mediator path (JSON over REST) in PCG mode, or via PIAF/ICD in EPG (IPOS) mode.
How Config Reaches data-plane
The data-plane does NOT speak PFCP directly in PCG mode. Instead:
- SMF sends PFCP to the UP Control (UPC) microservice
- UPC translates PFCP rules into a JSON configuration model
- UPC pushes config to data-plane via CM Mediator (REST/JSON)
- data-plane parses JSON and installs forwarding rules
// In main/dp_cm_mediator.c β handles config from CM Mediator
// In main/dp_config_json_parser.c β parses the JSON config model
// In main/dp_config_json_validator.c β validates before applying
Chapter 3: Where Data Plane Fits in Ericsson PCG
The PCG Product
PCG (Packet Core Gateway) is Ericsson's cloud-native 5G UPF product. It runs on Kubernetes and consists of multiple microservices:
Key Neighboring Microservices
| Service | Role | Interface to DP |
|---|---|---|
| UP Control (UPC) | PFCP termination, session management | JSON config via CM Mediator |
| Routing Agent | Route management, BGP | gRPC (route announcements) |
| CM Mediator | Configuration distribution | REST (JSON patches) |
| Log Transformer | Log collection | stdout/syslog |
| PM Server | Metrics collection | Prometheus scrape |
| KV DB (Redis) | Session state, geo-redundancy | Redis protocol |
| Kafka | Event streaming (IPFIX, etc.) | Kafka producer |
EPG vs PCG vs VPN-GW
The same data-plane binary is used in multiple products:
- PCG β Cloud-native 5G UPF (Linux containers, DPDK)
- EPG β Evolved Packet Gateway on IPOS (bare-metal blades, WITH_IPOS_SDK)
- VPN-GW β VPN Gateway variant
WITH_IPOS_SDK, EPG_BUILD throughout the code. For PCG development (what you'll mostly do), these are OFF. Code inside #if defined(WITH_IPOS_SDK) is EPG-specific.
π Quiz 1 β Context & Fundamentals
Q1: What protocol does the SMF use to communicate with the UPF?
Q2: In PCG mode, how does configuration reach the data-plane?
Q3: What does a PDR (Packet Detection Rule) do?
Q4: What does the compile flag WITH_IPOS_SDK indicate?
Q5: What is the 5G equivalent of the 4G term "APN"?
Chapter 4: High-Level Architecture Overview
Single-Process, Multi-Threaded
The data-plane is a single process with many threads, each pinned to a specific CPU core. This is a classic DPDK pattern β avoid context switches, avoid locks, maximize cache locality.
Key Architectural Principles
- Run-to-completion: Each packet is fully processed on one CPU core (no hand-offs mid-pipeline)
- Lock-free data structures: RCU (Read-Copy-Update) for config updates while forwarding continues
- Shared-nothing where possible: Per-CPU data structures to avoid cache bouncing
- Mailbox for cross-CPU communication: When CPUs must communicate, they use lock-free mailboxes
- Event-driven controller: The controller thread uses EVL (event loop) for async I/O
The Global g_dp Structure
The entire data-plane state is rooted in a single global: dataplane_t* g_dp. This is the "god object" that holds references to all subsystems:
// From main/dp.h
typedef struct dataplane dataplane_t;
extern dataplane_t* g_dp;
dataplane_t*
dataplane_create(
cb_timer_framework_cpu_t* cb_timer_fws,
dp_options_t* options,
random_t* random,
evl_t* evl, // Event loop for controller
evl_t* evl_dbproxy, // Event loop for DB proxy
...
mbox_t* mbox, // Mailbox for inter-CPU msgs
rcu_t* rcu, // Read-Copy-Update
pktio_t* pktio, // Packet I/O subsystem
...
);
Chapter 5: CPU Roles
CPU Role Assignment
Each CPU core in the data-plane pod is assigned a specific role. The assignment is dynamic based on available cores:
| Role | Count | Responsibility |
|---|---|---|
| Controller | 1 | Configuration, REST API, timers, OAM, session management |
| Ingress | 1+ | Receive packets from NIC, classify, spray to workers |
| Worker (Forwarding) | Many | Full packet processing pipeline (the "fast path") |
| Input | 1 | Handle punted packets (ARP, BFD, ICMP, control protocols) |
| Output | 1 | Transmit packets to NIC after processing |
main/ipos/assign_dynamic_cpu_roles.c and main/linux/src/assign_dynamic_cpu_roles.c. The number of worker cores scales with the pod's CPU allocation.
Controller Thread Details
The controller runs an EVL event loop and handles:
- REST API (health checks:
/alive,/ready) - Configuration reception and parsing (CM Mediator)
- Timer management (session timeouts, keepalives)
- OAM (Operations, Administration, Maintenance)
- Redis/DB operations (session state persistence)
- Kafka producer (IPFIX events, logs)
- PFCP relay (in some modes)
Worker Thread Details
Workers run a tight poll loop (no sleeping!) that:
- Polls the NIC RX queue (via DPDK/eventdev)
- Classifies the packet (GTP? IP? ARP?)
- Looks up the session/PDR
- Applies service chain (DPI, NAT, QoS, firewall)
- Encapsulates if needed (GTP, GRE)
- Transmits on the TX queue
// Simplified worker loop concept (from pktio/src/pktio.c)
while (running) {
nb_pkts = rte_eth_rx_burst(port, queue, pkts, BURST_SIZE);
for (i = 0; i < nb_pkts; i++) {
classify_packet(pkts[i]);
apply_service_chain(pkts[i]);
transmit_packet(pkts[i]);
}
mbox_poll(mbox); // Check for config updates
}
Chapter 6: Packet Processing Pipeline
The Fast Path
The "fast path" is the optimized packet processing pipeline that handles the vast majority of traffic without involving the controller:
Slow Path (Punt)
Some packets can't be handled on the fast path and are "punted" to the Input/Controller CPU:
- ARP requests/replies
- BFD (Bidirectional Forwarding Detection) keepalives
- ICMP (ping, unreachable)
- LACP (Link Aggregation Control Protocol)
- First packet of a new flow (if buffering is needed)
- Packets requiring fragmentation/reassembly
Network Instance (NWID)
A Network Instance (NI, also called NWID in code) is the data-plane's equivalent of a VRF. Each DNN/APN maps to a network instance with its own:
- Routing table (FIB)
- IP address space
- GRE tunnels
- Service chain configuration
Chapter 7: Traffic Spraying
Why Spraying?
With multiple worker CPUs, incoming traffic must be distributed. Spraying is how packets are assigned to worker cores. The goal: even load distribution while keeping related packets on the same core (for stateful processing).
Spraying Modes
| Mode | Hash Key | Use Case |
|---|---|---|
| Session | GTP TEID | All packets of one session β same core |
| Flow | 5-tuple (src/dst IP, ports, proto) | Per-flow affinity |
| GRE | GRE key | GRE tunnel-based distribution |
| Packet | Round-robin or RSS | Maximum parallelism (stateless) |
pktio/src/ingress_spraying.c and pktio/src/ingress_spraying.h. The spraying logic uses the five-tuple hash (pktio/src/five_tuple.c) and flow tables (pktio/src/flow_table.c).
Eventdev (Hardware-Assisted Spraying)
On supported NICs, DPDK's eventdev framework can do spraying in hardware/firmware, offloading the ingress CPU. See pktio/src/eventdev.c.
// From pktio/src/eventdev.h β eventdev configuration
// Eventdev distributes packets to worker cores based on flow_id
// This avoids software-based spraying overhead
π Quiz 2 β Architecture
Q1: What is the "run-to-completion" model?
Q2: Which CPU role handles configuration and REST APIs?
Q3: What mechanism is used for lock-free config updates while forwarding continues?
Q4: Which spraying mode uses the GTP TEID as hash key?
Q5: What kind of packets get "punted" to the slow path?
Chapter 8: Packet I/O (PKTIO)
Overview
The pktio/ directory is the packet I/O subsystem β the interface between the data-plane and the network. It abstracts multiple backends behind a common API.
PKTIO Backends
| Backend | File | Use Case |
|---|---|---|
| libpio | pktio_libpio.c | Production: DPDK-based high-performance I/O |
| linux | pktio_linux.c | AF_XDP / raw sockets for non-DPDK environments |
| pktsock | pktio_pktsock.c | Testing: packet sockets (used in SFT/contest) |
| tap | pktio_tap.c | Testing: TAP devices for local testing |
| native | pktio_native.c | Native Linux networking |
Key PKTIO Concepts
- Packet classification (
classifier.c) β Determines packet type (GTP, GRE, plain IP, ARP, etc.) - Five-tuple extraction (
five_tuple.c) β Extracts src/dst IP, ports, protocol for flow hashing - Flow table (
flow_table.c) β Tracks active flows for stateful processing - NWID table (
nwid_table.c) β Maps tunnel keys to network instance IDs - Encapsulation (
pktio_encap*.c) β GRE, MPLS, IP encap/decap - Packet flow expiry (
packet_flow_expiry.c) β Timeout inactive flows
// The main PKTIO public API (pktio/include/pktio/pktio.h)
// ~50K lines β this is the largest header in the project
// Key functions:
pktio_t* pktio_create(...);
void pktio_start(pktio_t* pktio);
void pktio_stop(pktio_t* pktio);
int pktio_rx_burst(pktio_t* pktio, pktio_packet_t* pkts, int max);
int pktio_tx_burst(pktio_t* pktio, pktio_packet_t* pkts, int n);
FIB (Forwarding Information Base)
The pktio/fib/ subdirectory contains the FIB β the routing lookup table used during packet forwarding. It's a longest-prefix-match (LPM) structure optimized for fast lookups.
Chapter 9: UPF Engine & DPI
UPF Module (upf/)
The upf/ directory handles UPF-specific session logic β PDR installation, FAR application, URR counting. It works closely with the service chaining engine.
DPI (Deep Packet Inspection)
DPI is handled by the cdpi-main/ directory and the external dpisf library. It classifies application-layer traffic (YouTube, Netflix, WhatsApp, etc.) using:
- Heuristics β Pattern matching on packet payloads
- SNI inspection β TLS Server Name Indication
- DNS correlation β Map DNS responses to flows
dpi-packages/ directory and scripts/download_heuristics_package.sh handle downloading DPI signature databases. These are versioned separately from the main binary.
DPI Thread Controller
DPI can run on dedicated threads to avoid impacting forwarding latency. See main/dp_config_dpi_thread.c and main/dp_config_dpi_thread_controller.c.
Chapter 10: VRF, GRE & FIB
VRF Module (vrf/)
The VRF (Virtual Routing and Forwarding) module manages network instances β isolated routing domains, each with their own FIB, GRE tunnels, and interfaces.
// VRF source files (from vrf/CMakeLists.txt)
src/config.c // VRF configuration handling
src/config_delta_builder.c // Incremental config changes
src/controller.c // VRF controller (control plane side)
src/engine.c // VRF engine (data plane side)
src/routes.c // Route management
src/mac_learning_mgr.c // MAC address learning
src/cre_route.c // CRE (Cloud Routing Engine) routes
GRE Tunnels
GRE (Generic Routing Encapsulation) tunnels connect the data-plane to the transport network. Each network instance can have multiple GRE tunnels. The GRE subsystem lives in vrf/gre/.
FIB Controller (vrf/fib-ctrl/)
The FIB controller manages route installation/removal. Routes come from:
- Static configuration (CM Mediator)
- Routing Agent (dynamic routes via gRPC)
- Connected routes (interface addresses)
- CRE (Cloud Routing Engine)
Route Announcements
The data-plane announces its routes to the Routing Agent so that external routers know how to reach UE addresses. See main/dp_config_route_announcements.c.
Chapter 11: CGNAT, Service Chaining & Protocols
CGNAT (cgnat/)
Carrier-Grade NAT translates private IPv4 addresses to shared public IPs. Key components:
- IP Pool Manager β Allocates public IPs and port ranges
- Translation Table β Maps (private IP:port) β (public IP:port)
- Service Functions β Per-protocol NAT (TCP, UDP, ICMP)
- Garbage Collector β Reclaims expired translations
- IPFIX logging β Reports NAT events for lawful intercept compliance
Service Chaining (sc/)
The Service Chaining engine is the heart of packet processing. It executes an ordered list of "service functions" on each packet:
Key files in sc/:
sc.cβ Main service chain controllerengine.cβ Fast-path service chain executionsession.cβ Session data managementsf.cβ Service function frameworkapi.cβ Service chain API (170K lines!)confighandler.cβ Configuration handlingtromboning.cβ Traffic steering to external service functions
IPFIX (ipfix/)
IPFIX (IP Flow Information Export) generates NAT logging records. Required for lawful intercept β when CGNAT is used, operators must log which subscriber had which public IP:port at what time.
Protocol Handling
The data-plane handles several control protocols on the slow path:
| Protocol | Purpose |
|---|---|
| ARP | Address resolution on N6 interfaces |
| BFD | Fast failure detection on GRE tunnels |
| GTP-U | User plane tunneling (N3/N9) |
| LACP | Link aggregation (bonded interfaces) |
| ICMP/ICMPv6 | Ping, unreachable, PMTUD |
| TCP | NAT state tracking, RST generation |
π Quiz 3 β Subsystems
Q1: Which PKTIO backend is used in production with DPDK?
Q2: What does the NWID table map?
Q3: What is the primary purpose of CGNAT in the data-plane?
Q4: Why is IPFIX logging required when CGNAT is used?
Q5: What does "tromboning" mean in the service chaining context?
Chapter 12: Module Lifecycle API
The Module Pattern
Every subsystem in data-plane follows a strict module lifecycle pattern defined in MODULE_GUIDELINES.md. This ensures controlled startup and shutdown.
State Machine
API Contract
// Every module exposes these functions:
// Constructor β allocate and initialize
my_module_t* my_module_create(dependencies...);
// Start β begin async operations, call on_started_cb when ready
void my_module_start(my_module_t* m, on_started_cb, cb_ctx);
// Stop β gracefully shut down, call on_stopped_cb when done
void my_module_stop(my_module_t* m, on_stopped_cb, cb_ctx);
// Destructor β free all resources (only after stopped)
void my_module_delete(my_module_t* m);
Key Rules
- Start must not fail. If resources are missing, retry until available.
- Start is called only once. Use
module_state_t+abort_unlessto detect violations. - Stop must be callable before start completes. It inhibits the on_started_cb.
- After on_stopped_cb, the module is safe to delete. All timers, defers, and transactions must be complete.
- Dependencies are injected in create/start. Never expose them via getters.
Chapter 13: Control-Engine Pattern & Mailbox
Control-Engine Split
Many modules have two parts:
- Controller β Runs on the controller CPU, handles config, manages state
- Engine β Runs on worker CPUs, processes packets at line rate
Example: VRF has vrf/src/controller.c (control) and vrf/src/engine.c (data path).
Mailbox (mbox)
The mailbox is the inter-CPU communication mechanism. It's a lock-free SPSC (single-producer, single-consumer) or MPSC queue.
// Sending a message from controller to worker
mbox_msg_t msg = {
.type = MBOX_MSG_CONFIG_UPDATE,
.data = new_config_ptr
};
mbox_send(worker_mbox, &msg);
// Worker polls mailbox in its main loop
while (mbox_recv(my_mbox, &msg)) {
handle_message(&msg);
}
Mbox Dispatcher
The mbox_dispatcher provides a higher-level API for synchronous and asynchronous cross-thread tasks:
// Execute a function on a specific worker thread
mbox_dispatcher_dispatch(dispatcher, target_cpu, my_function, arg);
// Used for operations that need specific thread context
// (e.g., timer start/stop must happen on the timer's owning thread)
Thread Context Rules
Chapter 14: RCU, Async Config & EVL Batch Iterator
RCU (Read-Copy-Update)
RCU allows the controller to update configuration while workers continue reading the old version without locks:
// Writer (controller thread)
new_config = build_new_config(...);
rcu_assign_pointer(global_config, new_config);
rcu_synchronize(rcu); // Wait for all readers to finish with old
free(old_config);
// Reader (worker thread) β no locks needed!
rcu_read_lock(rcu);
config = rcu_dereference(global_config);
use_config(config);
rcu_read_unlock(rcu);
Async Configuration Model
From CONFIG_GUIDELINES.md: Configuration is applied asynchronously using a two-step model:
- Step 1 β Build config: Parent module builds a module-specific config object
- Step 2 β Apply config: Child module applies it via deferred jobs
// Every module exposes:
void module_set_config(
module_t* module,
module_config_t* config,
on_done_cb_t on_done, // Called when ALL deferred work completes
void* on_done_ctx,
void* on_done_arg
);
// Key rules:
// - All deferred work MUST be started inside set_config
// - on_done MUST be called exactly once, after all work completes
// - on_done may NOT be interrupted or cancelled
EVL Batch Iterator
For "large number objects" (e.g., thousands of network instances), you must NOT iterate in a blocking loop. Use the EVL Batch Iterator:
// BAD β blocks the event loop for too long
for (i = 0; i < 10000; i++) {
configure_network_instance(ni[i]); // β Blocks!
}
// GOOD β process one at a time with low-priority defers
evl_batch_iterator_config_t cfg = {
.defer_priority = EVL_DEFER_PRIORITY_LOW,
.batch_size = 1,
.on_item = configure_one_ni,
.on_done = all_ni_configured_cb
};
evl_batch_iterator_start(&cfg, ni_list, count);
π Quiz 4 β Code Patterns
Q1: What happens if a module's start function encounters missing resources?
Q2: When is it safe to call module_delete()?
Q3: What is the purpose of RCU in the data-plane?
Q4: Why must large number objects be processed with low-priority defers?
Q5: What mechanism is used for cross-CPU communication?
Chapter 15: up-common & EVL (Event Loop)
What is up-common?
The up-common/ directory is a git submodule containing shared libraries used across all UP (User Plane) microservices. Think of it as the "standard library" for the data-plane ecosystem.
Key Libraries in up-common
| Library | Purpose |
|---|---|
evl | Event loop (epoll-based async I/O) |
mbox | Lock-free inter-thread mailbox |
rcu | Read-Copy-Update for lock-free reads |
smp | SMP utilities (CPU pinning, barriers) |
cbtimer | Callback-based timer framework |
tw-client | Timer wheel client (efficient bulk timers) |
dstrace | Distributed tracing (OpenTelemetry) |
container | Data structures (hash maps, lists, etc.) |
net | Network utilities (inet_addr, etc.) |
http | HTTP client/server |
kafka | Kafka producer/consumer |
msgbus | Message bus abstraction (NATS) |
evlsock | EVL-integrated socket library |
tls | TLS configuration and management |
logging | Structured logging |
metrics | Prometheus metrics |
string | String utilities |
memory | Memory allocation wrappers |
db-proxy | Database proxy (Redis) |
db-tracker | Database connection tracking |
EVL (Event Loop)
EVL is the async I/O framework used by the controller thread. It wraps Linux epoll and provides:
// Create an event loop
evl_t* evl = evl_create();
// Register a file descriptor for read events
evl_add_fd(evl, fd, EVL_READ, my_callback, my_arg);
// Schedule a deferred task (runs on next iteration)
evl_defer(evl, my_deferred_fn, arg);
// Schedule a low-priority deferred task
evl_defer_low(evl, my_low_prio_fn, arg);
// Run the event loop (blocks until stopped)
evl_run(evl);
Chapter 16: Timers, Defers & Clients
Timer Framework (cbtimer)
The cbtimer library provides callback-based timers. Each CPU has its own timer framework instance (no cross-CPU timer operations without mbox).
// Create a timer
cb_timer_t* timer = cb_timer_create(fw, my_timeout_cb, arg);
// Start with 5 second timeout
cb_timer_start(timer, duration_from_seconds(5));
// Cancel
cb_timer_stop(timer);
// Destroy
cb_timer_delete(timer);
Timer Wheel (tw-client)
For scenarios with thousands of timers (e.g., session timeouts), the timer wheel is more efficient than individual cbtimers. It batches timer expirations.
Deferred Work
Defers are "do this later" tasks scheduled on the event loop:
- evl_defer() β Normal priority, runs soon
- evl_defer_low() β Low priority, yields to important work
- evl_defer_idle() β Runs only when nothing else is pending
REST Client & HTTP
The http library provides both server (for health endpoints) and client (for CM Mediator communication). The rest_client is a higher-level wrapper for making REST API calls.
tw_client (Timer Wheel Client)
Used for bulk session timers. The data-plane can have millions of active sessions, each with timeout timers. The timer wheel handles this efficiently with O(1) start/stop operations.
Chapter 17: ET Test Framework & dstrace
ET (Ericsson Test) Framework
All unit tests and SFTs use the ET framework β a C test framework from up-common. It provides:
// Test file structure
#include "et/et.h"
ET_TEST(my_module, test_create_and_delete) {
my_module_t* m = my_module_create(deps);
ET_ASSERT(m != NULL);
my_module_delete(m);
}
ET_TEST(my_module, test_start_stop) {
my_module_t* m = my_module_create(deps);
my_module_start(m, on_started, ctx);
// ... drive event loop until started ...
my_module_stop(m, on_stopped, ctx);
// ... drive event loop until stopped ...
my_module_delete(m);
}
ET_TEST_SUITE(my_module) {
ET_RUN_TEST(my_module, test_create_and_delete);
ET_RUN_TEST(my_module, test_start_stop);
}
SFT (Signal Flow Tests)
SFTs test the interaction between modules. They use stub/mock versions of dependencies (note the -sft library variants in CMakeLists.txt):
up-dp-gwu-sftβ SFT variant of the main GWU libraryup-dp-cgnat-sftβ SFT variant of CGNAT- Compiled with
ENV_UTandUP_SFTdefines
dstrace (Distributed Tracing)
The dstrace library integrates with OpenTelemetry for distributed tracing across microservices. It allows you to trace a request from SMF β UPC β data-plane.
// Create a span for a config operation
dstrace_span_t* span = dstrace_span_start(tracer, "apply_config");
// ... do work ...
dstrace_span_end(span);
π Quiz 5 β Libraries
Q1: What is up-common?
Q2: What Linux system call does EVL wrap?
Q3: Why use a timer wheel instead of individual cbtimers for sessions?
Q4: What compile defines are set for SFT test builds?
Q5: What does evl_defer_low() do differently from evl_defer()?
Chapter 18: CMake Structure & 3PP Management
CMake Organization
The project uses CMake as its build system. Each subdirectory has its own CMakeLists.txt:
data-plane/
βββ main/CMakeLists.txt # Main executable + libraries
βββ pktio/CMakeLists.txt # Packet I/O library
βββ vrf/CMakeLists.txt # VRF library
βββ cgnat/CMakeLists.txt # CGNAT library
βββ sc/CMakeLists.txt # Service Chaining library
βββ ipfix/CMakeLists.txt # IPFIX library
βββ contest/CMakeLists.txt # Integration tests
βββ up-common/ # Submodule with its own CMake
Library Naming Convention
Libraries follow the pattern up-dp-{module} with SFT variants:
up-dp-vrfβ Production VRF libraryup-dp-vrf-sftβ SFT variant (stubs, test hooks)up-dp-gwuβ Main "gateway user-plane" library (the big one)up-dp-gwu-sftβ SFT variant of GWU
Build Targets
# The main executable
add_executable(data-plane main.c)
# Links against everything
target_link_libraries(data-plane PRIVATE
up-dp-gwu # Main library
up-dp-vrf # VRF
up-dp-cgnat # CGNAT
up-dp-sc # Service Chaining
up-dp-ipfix # IPFIX
up-dp-pktio # Packet I/O
up-evl # Event loop
up-mbox # Mailbox
up-rcu # RCU
...
)
3PP (Third-Party Packages)
Third-party dependencies are managed via the staging directory. Running bob generate:3pp downloads and unpacks them:
# Key 3PP dependencies:
- DPDK # Data Plane Development Kit
- jansson # JSON parsing
- hiredis # Redis client
- flatbuffers # Serialization (for Kafka messages)
- grpc/protobuf # gRPC (routing agent communication)
- openssl # TLS
- abseil (absl) # Google's C++ utilities
- yyjson # Fast JSON parser
Chapter 19: Bob Tool & Rulesets
What is Bob?
Bob is Ericsson's ADP (Application Development Platform) build orchestration tool. It reads ruleset2.0.yaml files and executes build rules in Docker containers for reproducibility.
Key Bob Commands
# Initialize bob (downloads bob binary)
$ bob/bob init-dev
# Generate 3PP staging (download dependencies)
$ bob/bob generate:3pp
# Generate CMake build system
$ bob/bob -p build-dir=builds/san generate:cmake
# Build C++ targets
$ bob/bob -p build-dir=builds/san build:cpp
# Run tests
$ bob/bob -p build-dir=builds/san test:cpp
# Build Docker image
$ bob/bob -p build-dir=builds/release build image package
Ruleset Structure (ruleset2.0.yaml)
The ruleset defines build rules, Docker images, properties, and task sequences. Key sections:
- docker-images β Build environment containers
- properties β Variables (repos, versions, paths)
- rules β Named tasks (generate, build, test, image, etc.)
- env β Environment variables
Bob Parameters (-p)
| Parameter | Effect |
|---|---|
build-dir=X | Output directory for build artifacts |
cc=clang | Use Clang compiler |
cxx=clang++ | Use Clang++ for C++ |
asan=on | Enable AddressSanitizer |
ubsan=on | Enable UndefinedBehaviorSanitizer |
assert=on | Enable assertions |
coverage=on | Enable code coverage |
lto=off | Disable Link-Time Optimization (faster builds) |
shared-libs=on | Build shared libraries (faster linking) |
cpp-target=X | Build only specific target |
Chapter 20: Makefile Developer Shortcuts
The Developer Makefile
The top-level Makefile wraps bob commands into convenient shortcuts. Run make help to see all targets:
# Quick test (assert build, fast)
$ make test
# Test with sanitizers (catches memory bugs)
$ make testsan
# Test a specific suite
$ make test t=my_module_SUITE
# Build a Docker image for system test
$ make image
# Run contest (container integration tests)
$ make contest
# Get test coverage report
$ make testcov
# Run linter (clang-tidy on your commit)
$ make lint
# Generate compile_commands.json for IDE
$ make lsp
# Clean build artifacts
$ make clean
# Nuclear option β delete everything
$ make realclean
Build Variants
| Variant | Directory | Purpose |
|---|---|---|
| assert | builds/assert | Fast build with assertions. Default for make test |
| san | builds/san | ASan + UBSan. Catches memory bugs. Used for make testsan |
| san2 | builds/san2 | Alternative sanitizer config (GCC instead of Clang) |
| debug | builds/debug | No optimization (-O0). Best for GDB debugging |
| cov | builds/cov | Coverage instrumentation. For make testcov |
| release | builds/release | Production build (LTO, optimized). For make image |
make test for quick iteration and make testsan before pushing. The CI runs both plus more.
ccache
The Makefile auto-detects ccache directories (/local/scratch/ccache or /local/persistent_docker/ccache) to speed up rebuilds. If you have ccache configured, subsequent builds are much faster.
Chapter 21: Unit Tests & SFT
Test Organization
Tests live in test/ subdirectories within each module:
cgnat/test/ut_cgnat_*.c # CGNAT unit tests
pktio/test/test_pktio*.c # PKTIO tests
sc/test/utest_*.c # Service Chaining tests
ipfix/test/ut_ipfix_*.c # IPFIX tests
vrf/test/ # VRF tests
main/linux/test/test_linux.c # Linux-specific tests
Running Tests
# Run all tests
$ make test
# Run specific test suite
$ make test t=ut_cgnat_ip_pool_SUITE
# Run with sanitizers
$ make testsan t=ut_cgnat_ip_pool_SUITE
# Under the hood, bob uses CTest:
$ cd builds/san && ctest -R ut_cgnat
Writing a Unit Test
#include "et/et.h"
#include "cgnat/cgnat.h"
// Setup/teardown per test
static cgnat_t* cgnat;
static void setup(void) {
cgnat = cgnat_create(test_deps);
}
static void teardown(void) {
cgnat_delete(cgnat);
}
ET_TEST(cgnat_pool, allocate_returns_valid_ip) {
ip_addr_t addr = cgnat_pool_allocate(cgnat->pool);
ET_ASSERT(addr.s_addr != 0);
ET_ASSERT_EQ(addr.s_addr, expected_first_ip);
}
ET_TEST_SUITE(cgnat_pool) {
ET_SETUP(setup);
ET_TEARDOWN(teardown);
ET_RUN_TEST(cgnat_pool, allocate_returns_valid_ip);
}
SFT vs UT
| Aspect | Unit Test (UT) | Signal Flow Test (SFT) |
|---|---|---|
| Scope | Single function/module | Multiple modules interacting |
| Dependencies | Fully mocked | Real modules with stub externals |
| Libraries | Standard libs | *-sft variants (e.g., up-dp-gwu-sft) |
| Speed | Very fast | Slower (more setup) |
| Defines | ENV_UT | ENV_UT + UP_SFT |
Chapter 22: Contest & VETO
Contest (Container Integration Tests)
Contest runs the actual data-plane binary in a Docker container with simulated network interfaces. It tests end-to-end packet forwarding.
# Run contest locally
$ make contest
# What happens:
# 1. Builds data-plane with sanitizers
# 2. Builds Docker image
# 3. Starts docker-compose with:
# - data-plane container (SUT)
# - test driver container
# - mock services (Redis, CM Mediator, etc.)
# 4. Runs test scenarios
Contest Structure
contest/
βββ contest.yaml # Docker-compose definition
βββ run.sh # Test runner script
βββ main.c # Test driver entry point
βββ framework/ # Test utilities
β βββ contest.c # Framework core
β βββ pfcp.c # PFCP message builder
β βββ config.c # Config injection
β βββ session.c # Session management
β βββ io.c # Packet send/receive
β βββ sut.c # System Under Test control
βββ tests/ # Test cases
β βββ test_pgwu.c # PGW-U scenarios
β βββ test_colocated.c # Co-located SGW-U + PGW-U
β βββ test_vpngw.c # VPN-GW scenarios
β βββ test_redis.c # Redis/geo-redundancy tests
βββ simulator/ # Mock external services
VETO (System Test)
VETO is the full system test that runs on real hardware or cloud infrastructure with the complete PCG deployment. It's triggered by the CI pipeline (JenkinsfileVeto).
evrtd (test channel allocation). See contest/evrtd/ for SSH tunnel scripts.
Chapter 23: Fuzz Testing, ISSU & Benchmarks
Fuzz Testing
The project includes fuzz tests that feed random/malformed data to parsers:
pktio/test/fuzz_pktio.cβ Fuzz the packet parser with random bytes- Uses libFuzzer (Clang's built-in fuzzer)
- Catches crashes, buffer overflows, undefined behavior
ISSU (In-Service Software Upgrade)
ISSU tests verify that the data-plane can be upgraded without dropping traffic. The new version takes over sessions from the old version via shared state in Redis.
Benchmarks
Performance benchmarks measure packets-per-second and latency:
pktio/test/bench_pktio.cβ Raw packet I/O throughputpktio/test/bench_encap.cβ Encapsulation performancepktio/test/bench_encap_mpls.cβ MPLS encap performancepktio/test/bench_ingress_spraying.cβ Spraying overhead
Stress Tests
pktio/test/pktio_stress.c β Long-running stress test that exercises the packet path under load to find race conditions and memory leaks.
π Quiz 6 β Testing
Q1: What is the difference between UT and SFT?
Q2: What does contest test?
Q3: Which command runs tests with AddressSanitizer?
Q4: What does ISSU testing verify?
Q5: What tool is used for fuzz testing?
Chapter 24: PreCodeReview Pipeline
What Triggers It
Every time you push a patch to Gerrit, the JenkinsfilePreCodeReview pipeline runs automatically. It's the gatekeeper β your patch won't get merged if this fails.
Pipeline Stages
What Gets Checked
- Compilation: Must compile cleanly with Clang (warnings are errors)
- Tests: All tests must pass (assert build + sanitizer build)
- Clang-tidy: No new warnings in changed files
- Commit message: Must follow format (Jira ticket, proper subject line)
- Contest: End-to-end tests must pass
make test and make lint locally. This catches most issues before CI runs (saves 30+ minutes of waiting).
Chapter 25: Drop Pipeline
What is a "Drop"?
A drop is a versioned release candidate. The Drop pipeline (JenkinsfileDrop) runs on the main branch after patches are merged. It produces artifacts that can be deployed.
Drop Pipeline Stages
- Build β Full release build (LTO, optimized)
- Test β Complete test suite (UT + SFT + contest)
- Image β Build production Docker image
- Helm Package β Package Helm chart
- Publish β Push image to Docker registry, chart to Helm repo
- VA Scan β Vulnerability scanning (Trivy, Grype)
- Design Rule Check β ADP compliance checks
- Version β Tag with version number
Artifact Repositories
| Artifact | Repository |
|---|---|
| Docker image | serodocker.sero.gic.ericsson.se/proj-pc-dev/ |
| Helm chart (dev) | proj-pc-dev-helm-local |
| Helm chart (drop) | proj-pc-drop-1-helm-local |
| Documentation | proj-pc-marketplace-docs-dev-generic-local |
Chapter 26: PRA (Release) Pipeline
PRA = Product Release Approval
The PRA pipeline produces the official released version that goes to customers. It runs additional quality gates beyond the Drop pipeline:
- Full VA (Vulnerability Assessment) scan suite
- FOSSA license compliance check
- EVMS (Ericsson Vulnerability Management System) registration
- Munin/PLMS product lifecycle registration
- Documentation publishing
- Released artifact promotion (dev β released repos)
Version Scheme
# Version format: PREFIX + auto-increment
$ cat VERSION_PREFIX
CXU1012345
# Full version example: CXU1012345_1_R2A (drop 2, revision A)
Chapter 27: VA & SoC Pipelines
VA 2.0 (Vulnerability Assessment)
The VA pipeline scans for security vulnerabilities using multiple tools:
| Tool | What it Scans |
|---|---|
| Trivy | Container image CVEs (OS packages, libraries) |
| Grype | Additional CVE scanning (different database) |
| Hadolint | Dockerfile best practices |
| Kubesec | Kubernetes security configuration |
| Kubeaudit | Kubernetes security audit |
| CIS-CAT | CIS benchmark compliance |
| Xray | Artifactory-based vulnerability scan |
| CAPPA | ADP-specific security checks |
Daily Pipeline
The JenkinsfileDaily runs nightly and catches issues that don't block individual patches:
- Full test suite with extended timeout
- Memory leak detection (long-running tests)
- Performance regression detection
Readiness Check
JenkinsfileReadinessCheck verifies that the service meets ADP readiness criteria before release.
Chapter 28: Docker Image
Image Structure
The data-plane Docker image is built from a minimal base (ADP Base OS) and contains:
- The
data-planebinary (statically linked, stripped for release) - DPI heuristics packages
- DPDK drivers and configuration
- Health check scripts
- Minimal OS utilities
Building an Image for ST
# Build a release image
$ make image
# Output tells you the tag to use in ST:
# helm.flags=eric-pc-up-data-plane.imageCredentials.repoPath=proj-pc-dev,
# eric-pc-up-data-plane.images.dataPlane.tag=1.2.3.unstripped
Image Variants
- Release (stripped) β Production image, no debug symbols
- Unstripped β Has debug symbols for crash analysis
- Debug β Built with -O0 for GDB attachment
Contest Dockerfile
Contest uses a separate contest/Dockerfile that includes the test driver alongside the SUT (System Under Test).
Chapter 29: Helm Chart & Kubernetes
Helm Chart: eric-pc-up-data-plane
The data-plane is deployed as a Kubernetes DaemonSet or Deployment via Helm. Key resources:
- DaemonSet/Deployment β The data-plane pods
- ConfigMap β Static configuration
- Service β Internal service discovery
- ServiceAccount β RBAC permissions
- NetworkAttachmentDefinition β SR-IOV interface definitions
- PodDisruptionBudget β Ensures availability during upgrades
Key Helm Values
# Typical values override for ST:
eric-pc-up-data-plane:
replicaCount: 2
resources:
dataPlane:
requests:
cpu: "8"
memory: "16Gi"
hugepages-1Gi: "4Gi"
limits:
cpu: "8"
memory: "16Gi"
hugepages-1Gi: "4Gi"
imageCredentials:
repoPath: proj-pc-dev
images:
dataPlane:
tag: "your-version"
Pod Resource Requirements
- CPU: Dedicated cores (guaranteed QoS class) β typically 4-16 cores
- Memory: Large allocation for packet buffers and session tables
- Hugepages: Required by DPDK for DMA-capable memory (1GB pages)
- SR-IOV VFs: Virtual Functions for direct NIC access
Chapter 30: SR-IOV, DPDK & Cloud-Native I/O
Why DPDK?
Standard Linux networking (kernel TCP/IP stack) adds too much overhead for millions of packets/second. DPDK bypasses the kernel entirely:
SR-IOV (Single Root I/O Virtualization)
SR-IOV allows a physical NIC to present multiple Virtual Functions (VFs) that appear as independent NICs to the pod. Each data-plane pod gets its own VF with direct hardware access.
Memory: Hugepages
DPDK requires hugepages (1GB or 2MB pages) for:
- Packet buffer pools (mbufs)
- Ring buffers (NIC queues)
- Hash tables (flow tables, session tables)
Without hugepages, TLB misses would destroy performance.
CPU Isolation
Data-plane pods use CPU Manager (Kubernetes) to get dedicated cores. The kernel's scheduler never runs other workloads on these cores, ensuring consistent latency.
Chapter 31: Linting & CodeChecker
Clang-Tidy
The project uses clang-tidy for static analysis. Configuration is in cdpi-main/.clang-tidy (29K lines!) which enables/disables specific checks.
# Run clang-tidy on your changes
$ make lint
# What it checks:
# - Bugprone patterns (use-after-move, suspicious comparisons)
# - Modernize suggestions (C++17 features)
# - Performance issues (unnecessary copies)
# - Readability (naming conventions)
# - Custom Ericsson checks
CodeChecker
For deeper analysis, CodeChecker runs Clang Static Analyzer:
# Run CodeChecker (slower but more thorough)
$ make codechecker
# Generates HTML report at:
# builds/assert/code-checker-report/
# View with: python3 -m http.server -d ./builds/assert/code-checker-report
Commit Linting
The CI also checks commit message format:
# Good commit message:
PCPB-12345: Add timeout handling to CGNAT garbage collector
Implement periodic cleanup of expired NAT translations.
The garbage collector now runs every 30 seconds and removes
translations that have been idle beyond the configured timeout.
# Bad (will be rejected):
fix stuff
Chapter 32: Design Rule Checks
ADP Design Rules
Ericsson's ADP (Application Development Platform) defines mandatory design rules for cloud-native services. The CI checks compliance automatically.
Helm DR (Design Rules)
Checked by adp-helm-dr-check Docker image. Validates:
- Chart structure and naming conventions
- Required labels and annotations
- Resource limits defined
- Security context (non-root, read-only filesystem)
- Service mesh compatibility
- Graceful shutdown support
Image DR (Design Rules)
Checked by adp-image-dr-check. Validates:
- Base image is ADP Base OS
- No unnecessary packages installed
- Non-root user
- Proper labels (version, name, vendor)
- No hardcoded secrets
- Minimal attack surface
PLM DR (Product Lifecycle Management)
Ensures proper product registration in Ericsson's systems (Munin, EVMS).
Chapter 33: VA Scans, FOSSA & EVMS
Vulnerability Assessment Flow
FOSSA (License Compliance)
FOSSA scans all dependencies for license compliance. It ensures:
- No GPL-licensed code in the product (incompatible with proprietary)
- All 3PP licenses are documented
- Attribution notices are included
- No license conflicts between dependencies
EVMS (Ericsson Vulnerability Management System)
All known CVEs in the product are registered in EVMS. The team must:
- Acknowledge each CVE
- Assess impact (exploitable in our context?)
- Plan remediation (upgrade dependency, apply patch, accept risk)
Munin / PLMS
Product registration system. Each release is registered with:
- Version number
- Bill of materials (all 3PP components)
- License information
- Security assessment status
π Quiz 7 β Quality & Compliance
Q1: What does `make lint` run?
Q2: Why is FOSSA scanning important?
Q3: What does the Image DR check verify?
Q4: What is EVMS used for?
Q5: Which tools scan the Docker image for CVEs?
Chapter 34: Gerrit Workflow & Making a Test Image
Gerrit Patch Lifecycle
Commit Message Format
PCPB-XXXXX: Short description (max ~72 chars)
Longer description explaining WHY the change is needed.
What problem does it solve? What approach was taken?
- Bullet points for multiple changes
- Reference related patches if needed
Change-Id: I1234567890abcdef (auto-generated by git hook)
Making a Test Image for ST
# Step 1: Build the release image
$ make image
# Step 2: Note the output β it gives you the Helm override:
# helm.flags=eric-pc-up-data-plane.imageCredentials.repoPath=proj-pc-dev,
# eric-pc-up-data-plane.images.dataPlane.tag=X.Y.Z.unstripped
# Step 3: In ST (Beets), paste that into "Custom Arguments"
# Alternative: Push to your personal repo
$ docker tag data-plane:latest serodocker.sero.gic.ericsson.se/proj-pc-dev/data-plane:my-test
$ docker push serodocker.sero.gic.ericsson.se/proj-pc-dev/data-plane:my-test
Amending a Patch
# After review feedback, amend your commit:
$ git add -p # Stage fixes
$ git commit --amend # Amend (keeps Change-Id)
$ git push origin HEAD:refs/for/master # Push new patchset
Rebasing on Latest Master
$ git fetch origin
$ git rebase origin/master
# Fix conflicts if any
$ git push origin HEAD:refs/for/master
Chapter 35: Debugging, Common Pitfalls & Useful Links
Debugging with GDB
# Build debug variant
$ make builds/debug
# Run under GDB (for unit tests)
$ cd builds/debug
$ gdb ./my_test_binary
(gdb) break my_function
(gdb) run
# Attach to running container (for contest/ST)
$ kubectl exec -it data-plane-pod -- /bin/sh
$ gdb -p $(pidof data-plane)
Reading Core Dumps
# With unstripped image:
$ gdb ./data-plane core.12345
(gdb) bt # Backtrace
(gdb) info threads # All threads
(gdb) thread 3 # Switch to thread 3
(gdb) bt # Backtrace of that thread
Common Pitfalls
| Pitfall | Symptom | Fix |
|---|---|---|
Forgetting rcu_synchronize | Use-after-free on config update | Always synchronize before freeing old data |
| Timer not cancelled in stop | Crash during shutdown | Cancel all timers before calling on_stopped_cb |
| Blocking in controller thread | Pod killed (liveness probe fails) | Use defers for long operations |
| Cross-CPU timer operation | Corruption/crash | Use mbox_dispatcher to run on correct CPU |
Missing EXCLUDE_FROM_ALL | Library built even when not needed | Add EXCLUDE_FROM_ALL to add_library() |
| Large config loop | Health check timeout | Use EVL batch iterator with low priority |
| Sanitizer suppression needed | False positive in 3PP | Add to lsan_suppr.txt or asan suppression file |
Useful Make Targets Cheat Sheet
make test # Quick test (assert build)
make testsan # Test with sanitizers
make test t=FOO_SUITE # Run specific suite
make contest # Container integration test
make image # Build Docker image for ST
make lint # Clang-tidy on your commit
make lsp # Generate compile_commands.json
make testcov # Coverage report
make codechecker # Deep static analysis
make clean # Clean builds (keep configs)
make realclean # Delete everything
Useful Links
- Gerrit: Your code review platform
- Jenkins: CI/CD dashboards
- Jira (PCPB project): Feature tracking
- Confluence: Team documentation
- Artifactory: Artifact repositories
- Grafana: Production metrics dashboards
punt/), you can navigate any of them.
π Congratulations!
You've completed the Data Plane Microservice Learning Guide.
35 chapters β’ 7 quizzes β’ From 3GPP fundamentals to daily debugging
Now go break something in the codebase and fix it. That's how you really learn. π