Reducing the delay of live streams by using 3 simple techniques
The 2018 World Cup in Russia sparked a discussion about live streaming delays. Why does it take 30 seconds after there is a goal on TV before I see it on my smartphone? Delay is an inherent problem of HTTP based streaming protocols, which we started using about 10 years ago. With 3 simple techniques outlined below, VRT was able to bring down latency (or delay) from 30 to 10 seconds.
What are HTTP based streaming protocols and why is there a delay?
Before HTTP streaming there were other protocols such as RTSP and RTMP which had the downside of requiring dedicated infrastructure to distribute (remember the Flash Media Server, or RealNetworks’ Helix Server?). HTTP streaming protocols can use standard HTTP infrastructure which makes it much more straightforward to build and maintain the distribution infrastructure (you can use a standard web server such as Apache or Nginx).
HTTP streaming protocols cut up the stream into segments, which are files that can be transported over HTTP like any other file. In order to ensure smooth playback the player will buffer a couple of segments. If a segment is 6 seconds long, and the player buffers 3 segments; this will introduce a delay of 18 seconds.
1. Deliver over HTTP/2 to reduce the connection overhead
HTTP based streaming protocols require a lot of connections for downloading manifests (to know which segments to download) and the segments themselves. Three trends have increased the overhead of these connections, the shift to (i) HTTPS which requires an extra TLS handshake, (ii) delivering video and audio in separate streams (referred to as unmuxed delivery), and (iii) smaller segment sizes (used to be 10 seconds, now 6 seconds or less).
HTTP/2 helps by allowing a connection to be re-used for subsequent requests. The below graph from a research paper by Mueller et al. compares the throughput performance of HTTP/1 and HTTP/2. Their test shows that HTTP/2 can sustain a higher throughput especially as the latency between the client and the server increases. Note that HTTP/1.1 with pipelining has a similar performance advantage, except that many browsers disable this functionality.
Make sure to check with your CDN provider(s) that they support HTTP/2 and have it enabled on your account.
2. Ensure that video and audio packets have the same length
The length in which units of video and audio are counted is different. For video we use frames which typically is 25 frames per second (in Europe). For audio this is a bit more tricky. AAC audio uses the concept of access units or packets, and one access unit contains 1024 samples.
So if we have a stream that has 25 frames per second of video and an AAC audio track with a sample rate of 48 kHz, this would mean that per second we have 25 video frames, but 48,000/1024 = 46.875 AAC access units. It’s not possible to have a fractional number of access units or frames. Therefore if we want the video and audio data in a segment to have the same length, we would need to find a length that has a whole number of frames and access units.
In our example good segment sizes would be 1.92, 3.2, 6.4 or 16 seconds. For example for 6.4 seconds
6.4 * 25 =160 video frames
6.4 * 48,000/1024 = 300 audio access unitsNote that you will also have to change the size of the GOP (Group of Pictures, or the amount of frames between keyframes) to ensure each segment starts with a keyframe. For example a GOP of 40 frames will ensure 4 complete GOPs (and consequently 4 keyframes) fit into a 6.4 second segment.
Having segments with an equal amount of video and audio ensures compatibility with devices that might have difficulty handling slightly out-of-sync segments, it also makes for a smaller manifest size if you use MPEG-DASH with a time based segment template (which you should).
3. Reduce segment duration to 2 seconds
Apple’s recommendation for the target duration of a segment is presently 6 seconds. Originally the recommendation used to be 10 seconds, and after that 9 seconds. How can we be sure which is the right length to use? Luckily the team at Bitmovin did an analysis of the correlation between segment length and throughput.
In the graph below you can see that they tested different segment sizes using persistent connections (HTTP/2) and non-persistent connections (HTTP/1). Their finding was that 5 to 8 second segments are optimal to maximise streaming throughput when using non-persistent, and 2 to 3 seconds being optimal for persistent connections. Practically speaking this means that 2 to 3 second segments are optimal when streaming over HTTP/2.
Furthermore, the segment duration has a direct impact on the latency of a live stream. Most players will buffer 3 segments before starting playback (independent of the length of the segments). If you are using 6 second segments, the stream will run a minimum of 3 * 6 = 18 seconds behind the live point.
Combining what we learned it makes sense to reduce the segment duration to 2 seconds. Of course 2 seconds does not align the video and audio tracks perfectly, so we slightly adjust the segment size to 1.92 seconds (or 48 frames).
Conclusion
By combining HTTP/2 delivery with aligned video and audio tracks and an optimal segment duration of 1.92 seconds, it is possible to create the best possible environment for the delivery of streaming live video and audio. This not only ensures viewers have the best possible streaming experience, but also reduces the end-to-end latency of the stream.
In VRT’s case we used 6 second segments before, and switched to 1.92 seconds. This reduced the live delay from 30 seconds to 10 seconds. We tested the latency by burning in the time at broadcast time (at the top right of the video in the above image) with the local time on the laptop (printed above the video). The difference is exactly 10 seconds.
It is possible to reduce the live delay further, but this would require further changes to the infrastructure. If you are interested to learn more about low latency technologies this is an interesting article looking at different methodologies to reduce the live delay.
If you have questions, or would like to further discuss; feel free to leave a comment or connect via Twitter @janthefox