The Payments Observability Stack

Plural Online by Pine Labs

We are in the business of moving money. A Payment Transaction transcends multiple services within our stack. There are multiple failure points and the ability to react to a failure and recover from the failure will be critical to ensure higher availability of the systems to merchant partners and customers. The ability to trace and monitor a transaction is supercritical from an operational perspective and for the stakeholders involved. Since it’s impractical to avoid exceptions due to multiple services and stakeholders in the path of a payment transaction, a reliable and reactive observability stack with the ability to monitor and alert these exceptions in near real-time with help the operations to identify issues in the external system like outages, issues due to incremental code rollouts and take corrective actions. The stack should provide the ability to trace the transaction as it flows through multiple services.

Functional Observability

  • Functional Observability involves business context monitoring in the context of payment, including approval rates, decline rates, a spike in declines, missing files, exceptions during file processing, etc.
  • Business errors that occur when new functionality is rolled out to Production and the ability to compare the Approval rates before and after the code rollouts should be realized using the Observability stack in place
  • Telemetry data includes the flow specific parameters logged from each service and the same data aggregated across different parameters.

For Example:

Approval rates across a specific Merchant

Declines rates across a specific Issuer

Count of specific Declines like Expired Card, Generic Decline etc.

It gives the ability for the Developers to debug at per-transaction level

Ability to trigger an alert for business failures like a spike in declines

Non — Functional Observability

  • Non-functional monitoring includes the following at the host level and aggregated at the system level (application server, serverless, DB, etc.)

o Memory Usage

o CPU

o Throughout

o Latency

Cornerstones of Pine Lab’s Observability Stack

  • Functional Monitoring through the ELK Stack

Business Context data for each transaction are logged at the per-transaction level

Metrics are extracted from Log files using Logstash and published to Kafka

Data flows through Kafka to the Elastic Cluster

Data is visualized using Kibana

  • Telemetry data from the application is sourced from the application and sent to the sinks in a vendor-agnostic way using the open telemetry APIs
  • Functional alerts were set up for the business failures with a threshold and notified to the Operations team to proactively attend to issues
  • Non-functional alerts for a spike in memory, CPU and the unresponsive app helps to take the corrective action

(The author of this article is Siva Shankar, VP Engineering – Payments, at Pine Labs. Views expressed in this article are that of the author.)

Scroll to Top