Socket Statistics on Linux with ss

April 3, 2019 ยท 3 minute read

An excellent tool when you need to troubleshoot distributed systems is ss, which provides visibility into TCP states of your live connections. Even if you know netstat this is a command you should get acquainted with because of its more powerful interface which requires fewer pipes to find the connection information you need and it is probably already installed on your favorite distribution.

In this post, we walk through a couple of examples, and then show a real-world production problem that can be diagnosed by effective use of ss.

To list TCP [-t] connections with destination port 9200 [dst :9200] and showing numeric port [-n] plus resolving DNS of addresses [-r] we use the following:

ss -rnt dst :9200

The output of the above command might look something like this:

Recv-Q Send-Q                         Local Address:Port                           Peer Address:Port
0      0                                  localhost:28655                             localhost:9200

# OR

Recv-Q Send-Q             Local Address:Port               Peer Address:Port
0      0         mysvc1-app3.dc1.exampleapp.com:44541      logs.dc1.exampleapp.com:9200
0      0         mysvc1-app3.dc1.exampleapp.com:33389      logs.dc1.exampleapp.com:9200
0      0         mysvc1-app3.dc1.exampleapp.com:45740      logs.dc1.exampleapp.com:9200
0      0         mysvc1-app3.dc1.exampleapp.com:53848      logs.dc1.exampleapp.com:9200
0      0         mysvc1-app3.dc1.exampleapp.com:33406      logs.dc1.exampleapp.com:9200
0      0         mysvc1-app3.dc1.exampleapp.com:33407      logs.dc1.exampleapp.com:9200
0      0         mysvc1-app3.dc1.exampleapp.com:33405      logs.dc1.exampleapp.com:9200

Now, this isn’t very remarkable. Let’s try to find TCP connections in CLOSE_WAIT state going to a specific destination address and port:

$ /usr/sbin/ss -tarn state close-wait

Recv-Q Send-Q                              Local Address:Port                                Peer Address:Port
1      0         mysvc-app1.dc1.exampleapp.com:40701    mysvc-app1.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:29470    mysvc-app1.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:35594    mysvc-app1.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:39683    mysvc-app1.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:22715    mysvc-app1.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:23824    mysvc-app1.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:37015   mysvc-app10.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:28927    mysvc-app1.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:37298    mysvc-app1.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:42596   mysvc-app20.dc1.exampleapp.com:50010
1      0         mysvc-app1.dc1.exampleapp.com:45345    mysvc-app1.dc1.exampleapp.com:50010

What about timers on sockets? Well, ss has what you want (list all [-a] TCP sockets [-t] with resoled DNS [-r], numeric ports [-n] that are in state of TIME_WAIT with timer information [-o]):

$ ss -arnto state time-wait

Recv-Q Send-Q                              Local Address:Port                                Peer Address:Port
0      0         mysvc-app1.dc1.exampleapp.com:21665                   ldap.dc1.exampleapp.com:389    timer:(timewait,22sec,0)
0      0         mysvc-app1.dc1.exampleapp.com:45311   mysvc-app13.dc1.exampleapp.com:2191   timer:(timewait,24sec,0)
0      0         mysvc-app1.dc1.exampleapp.com:21606                   ldap.dc1.exampleapp.com:389    timer:(timewait,2.730ms,0)
0      0         mysvc-app1.dc1.exampleapp.com:40319    mysvc-app2.dc1.exampleapp.com:9092   timer:(timewait,4.949ms,0)
0      0         mysvc-app1.dc1.exampleapp.com:37364   mysvc-app12.dc1.exampleapp.com:2191   timer:(timewait,28sec,0)
0      0                                       localhost:39282                                  localhost:17123  timer:(timewait,54sec,0)

Look ma, no pipes!

Looking for which process is listening on a port? Simple:

  • using -l we limit ourselves to listening sockets
  • using -t we limit ourselves to TCP
  • using -n we make sure the port is shown as numeric not alias
  • using -r we resolve the hostnames
  • using -p we show processes using that socket

Remember to get process information you usually need to sudo :)

$ sudo ss -nltp state listening src :3030
Recv-Q Send-Q                                                           Local Address:Port                                                             Peer Address:Port
0      100                                                                  127.0.0.1:3030                                                                        *:*      users:(("sensu-client",46889,17))

Note the users output is a 3-tuple with the process name as first entry, the PID as the second entry and unknown third entry (to me). Anyone know, it isn’t explained in the man page.

Can Anybody Hear Me?

Finally, let us walk through a production issue where ss helped us get to the root of the issue faster. Recently an alert was generated when one of our services in an environnt was no longer receiving Kafka messages. After taking the necessary steps to ensure that the Kafka cluster was up and functioning correctly, we went to the node that alerted. There we used ss to check for TCP connections to Kafka we saw no established TCP connections to Kafka, but we did see a connection in TCP state 'syn-sent'[1]:

$ sudo ss -antlp state syn-sent dst :9200
Recv-Q                        Send-Q                                               Local Address:Port                                                 Peer Address:Port
0                             1                                                        10.4.1.40:48928                                                   10.4.1.80:9200                         users:(("java",pid=13657,fd=3))

The output above tells us some useful information, specifically that this host is trying to talk to Kafka, but it isn’t able to connect to the broker. This information allowed us to investigate a possible networking problem, and in this specific example the DNS for Kafka had changed, and the application had cached the old entry.

1: https://en.wikipedia.org/wiki/TCP_half-open#Embryonic_connection