Telegram VOIP calls using Python

Aug
2018
10

Home Automation, Python

No comments

I’ve been experimenting with a Home Assistant based automation setup at home. It started as a mini greenhouse control temperature control and monitor system, and quickly evolved to a modest size system with a few devices. Using a combination of ESP8266s / ESP32s, Micropython and Arduino Framework code, I managed to connect my home alarm, a couple of doors and the HVAC system with relative ease and I once again felt the joy of unburdened programming …until I got to the doorbell.

That’s when, with my mind fueled by the home automation craze, I thought: “Hey, it’d be cool to be able to listen in on my doorbell from anywhere” (It seemed like a good idea at the time… you had to be there). Getting the sound to the Raspberry Pi that acts as the controller of the system would be relatively simple…but how to send the audio to my phone?

It turns out that there aren’t that many cross platforms that run on Linux (and ARM Linux at that!) and can do VOIP calls to Android or iOS. So I needed something that was easy to use, ran on ARM Linux and Android, preferably was open source, had security features and could be controller via scripting. Long story short: I didn’t find anything suitable, but! Telegram was almost a perfect fit, minus the easy to use part.

I had been using Telegram for quite some time for group messaging (and loving it), and I was marginally aware that their clients were open source and supported voice. So if I could somehow run their Linux client on the Raspberry Pi, I would be set. However, their desktop  is not scripting friendly (or documented, at all), and their CLI client is mostly a test platform for APIs and doesn’t support voice calling (and it’s not documented, at all). Further, while they provide code for all the parts of the process, the examples and documentation are scattered through the internets and difficult to stumble across.

So, here’s how I did it.

Telegram has different APIs that allow you to talk to their servers and make calls: There’s the Telegram proper API, tdlib (a “simplified” wrapper library around the Telegram API), the bots API (a subset of the Telegram API is available for bots), and then tgvoip which does the VOIP part.

If you want to make calls, you have to get an app id, this would be like your custom Telegram client that “humans” can use, as opposed to bots. Login to apps is only allowed via a Telegram registered phone number and bots can not access VOIP calls (at least at the time of this writing).

The first piece of the puzzle is tdlib. If you look at Telegram’s core API, it’s complex. tdlib simplifies the interaction with this asynchronous API by providing another asynchronous API that manages all the low level implementation details (such as computing encryption key hashes, decrypting the database, etc). There are a few python wrappers for tdlib, the best one that I found in my opinion was python-telegram which I forked here with a couple minor fixes to be able to receive all of tdlibs messages in the user app, and to control tdlib’s verbosity level which is a good source of information when debugging issues. I recommend reading python-telegram’s tutorial to get an idea of how the wrapper works.

tdlib offers two methods of usage. The first one is by linking directly to its low level functions, the other one is via a (yes, yet another) JSON API called tdjson which is what the Python wrapper presented above uses.

Going beyond the basic tutorial, I started looking for VOIP tutorials, and came up empty. So going through tdlib’s API list of functions, a few that seemed interesting showed up: createCall, acceptCall, etc. I decided to give those a go, and actually got my phone to ring!

Sadly, that’s all it did. Because I was missing the second piece of the puzzle: tgvoip. This is the library that actually does the UDP or TCP connection, encryption, Opus encoding and decoding, etc. You have to glue this to tdlib (somehow!) in order to have fully working Telegram VOIP calls.

tgvoip is C++ based, and gluing it to Python requires a C module. Luckily for you, I’ve made such a thing and published it here as pytgvoip. It even includes a Dockerfile for you docker crazed kids (in my day we installed dependencies by hand! BY HAND!), and I included a quick and dirty example of how to use:

#!/usr/bin/env python3
# Telegram VOIP calls from python example
# Author Gabriel Jacobo https://mdqinc.com

import logging
import argparse
import os
import json
import base64
from telegram.client import Telegram
from tgvoip import call


def setup_voip(data):
    # state['config'] is passed as a string, convert to object
    data['state']['config'] = json.loads(data['state']['config'])
    # encryption key is base64 encoded
    data['state']['encryption_key'] = base64.decodebytes(data['state']['encryption_key'].encode('utf-8'))
    # peer_tag is base64 encoded
    for conn in data['state']['connections']:
        conn['peer_tag'] = base64.decodebytes(conn['peer_tag'].encode('utf-8'))
    call(data)

def handler(msg):
    #print ("UPDATE >>>", msg)
    if msg['@type'] == 'updateCall':
        data = msg['call']
        if data['id'] == outgoing['id'] and data['state']['@type'] == 'callStateReady':
            setup_voip(data)


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('api_id', help='API id')  # https://my.telegram.org/apps
    parser.add_argument('api_hash', help='API hash')
    parser.add_argument('phone', help='Phone nr originating call')
    parser.add_argument('user_id', help='User ID to call')
    parser.add_argument('dbkey', help='Database encryption key')
    args = parser.parse_args()

    tg = Telegram(api_id=args.api_id,
                api_hash=args.api_hash,
                phone=args.phone,
                td_verbosity=5,
                files_directory = os.path.expanduser("~/.telegram/" + args.phone),
                database_encryption_key=args.dbkey)
    tg.login()

    # if this is the first run, library needs to preload all chats
    # otherwise the message will not be sent
    r = tg.get_chats()
    r.wait()


    r = tg.call_method('createCall', {'user_id': args.user_id, 'protocol': {'udp_p2p': True, 'udp_reflector': True, 'min_layer': 65, 'max_layer': 65} })
    r.wait()
    outgoing = r.update

    tg.add_handler(handler)
    tg.idle()  # blocking waiting for CTRL+C

Essentially what this does is use tdlib to issue a createCall to a Telegram user_id (getting the user_id from the phone number is its own thing so I won’t explain it here but Google’s your friend). tdlib will initiate and negotiate the call for you (that’s when the other phone starts ringing!) and eventually send a updateCall callback to your handler with a callStateReady state. This means the other user picked up the call, and now we have to pass the call information to tgvoip, and eventually manage the disconnection (not shown here).

So I had calls to my phone from my Linux desktop script finally working. But if you remember my initial goal, I was still nowhere near it…how to get there will be the topic of my next post.