Tutorial: a video chat system with just FFmpeg

(The format is weird because it is meant for microblogging platforms.)

Tutorial: a rudimentary but working video chat application using only FFmpeg. FFmpeg can do a lot of things, including let us build a video chat system; it will lack a lot of important features, but given the right conditions it will work.
1/20

Setting up the connection will be very manual. There will be no bandwidth management. No anti-echo. Worst: the hosts need to be able to initiate connections in both directions. If there's a NAT in the way, you'll need a work-around and I don't intend to talk about it.
2/20

And it will take a lot of FFmpegs: two directions, two streams (video and audio) in each direction, two ends to each stream: that's eight instances of ffmpeg or ffplay; ten if we want to see ourselves locally.
3/20

Let's start. The two hosts have names or IP addresses, let us say here and there. The system is symmetrical, I will write how to send my side of the conversation from here to there; just swap the hosts to for the other side of the conversation.
4/20

First: video input. FFmpeg can stream everything it can read. We'll just use the webcam: ffmpeg -f v4l2 -i /dev/video0. Look at other tutorials for codec settings, screen recording, etc. It needs to be real-time, use -re or -vf realtime if input is fast.
5/20

Second: encoding. Let's do simple: -c libx264 -preset medium -tune zerolatency -intra-refresh 1 -b 1024k. Latency is the enemy, we don't want the codec to wait for more frames just in case it can squeeze a little more quality. Again, other tutorials exist.
6/20

And the streaming. We'll use RTP on an arbitrary port: -f rtp rtp://there:9123/. RTP uses a small text file to carry the settings to the destination; ffmpeg will print that file on output, you'll need to redirect or copy-paste it to there; let's call it vh2t.sdp.
7/20

Full command:
ffmpeg -f v4l2 -i /dev/video0 -c libx264 -preset medium -tune zerolatency -intra-refresh 1 -b 1024k -f rtp rtp://there:9123/
8/20

For reference: a ffmpeg command has the structure ffmpeg [global options] [input options] -i input [other inputs] [output options] output [other outputs]. The global options can be anywhere. In/out options can be per stream, but we have only one stream here.
9/20

Now, on host there, we need to see the stream. We copied the vh2t.sdp file somehow, we will use it: ffplay -protocol_whitelist file,rtp,udp vh2t.sdp. It works a little faster if this end is started first, but it works in both cases.
10/20

Notice: the latency is quite bad. Let's add options to reduce it: -flags low_delay -flags2 +showall. A little better but still terrible, because ffplay maintains the speed of the input: initial delay stays; let's kill that: -vf setpts=0.
11/20

Full command:
ffplay -fs -protocol_whitelist file,rtp,udp -flags low_delay -flags2 +showall -vf setpts=0 vh2t.sdp
12/20

Now, audio. The principle is the same: capture (alsa), encode (libopus), stream:
ffmpeg -f alsa -i default -c libopus -b:a 32k -f rtp rtp://there:9133/
ffplay -protocol_whitelist file,rtp,udp ah2t.sdp
13/20

Latency seems fine, but that's not guaranteed. Sampling frequency will not be exactly the same. In the slow → fast direction, destination will have underruns, that's ok; in the fast → slow direction, samples will accumulate and cause delay.
14/20

So we'll skip a few seconds at the beginning, just to eliminate initial delay, and we'll speed-up the playback just a little: -af atrim=5,rubberband=1.0001.
15/20

Full command:
ffplay -nodisp -protocol_whitelist file,rtp,udp -af atrim=5,rubberband=1.0001 ah2t.sdp
16/20

It's nice to see ourselves. For that, we'll add a second video ffplay running on host here. We need to create a new SDP file, just like we created the first one, using a different port. We can also add scale=320:240,hflip to the -vf option: smaller, mirrored.
17/20

To send the stream to both players, we can add a second output to ffmpeg, but it will duplicate the CPU cost of encoding. Better encode once and stream twice: -f tee -map 0 '[f=rtp]rtp://there:9123/|[f=rtp]rtp://here:9125/'
18/20

Full command:
ffmpeg -f v4l2 -i /dev/video0 -c libx264 -preset medium -tune zerolatency -intra-refresh 1 -b 1024k -f tee -map 0 '[f=rtp]rtp://there:9123/|[f=rtp]rtp://here:9125/'
19/20

And that's all. With these five not-that-complicated FFmpeg commands we can stream our image and voice to another host. Variants are easy: one to several, several to one. All FFmpeg features are possible on the input before streaming or on the playback.
20/20