This post is based on https://jacksonchen666.com/posts/2022-12-03/14-33-00/ .
When running synapse as a matrix server it will eventually take a lot of disk space, especially when it is a federated server where local users have joined a lot or busy rooms.
How much data is used?
The data for synapse is stored in a database and the media files are (by default) stored on the filesystem. So how much data is currently in use?
How big is the database?
A normal synapse installation should use a PostgreSQL database, so we will have a look at the current size of that database to be able to see how much space we will have reclaimed after all the steps are done.
postgres@rhuarc:~$ psql -c "SELECT pg_size_pretty(pg_database_size('synapse'));" -t
15 GB
postgres@rhuarc:~$
The filesystem
The local and remote media is (by default) stored on the filesystem in
/var/lib/matrix-synapse/media . This can be changed with the
media_store_path
setting in the homeserver.yaml. Which I did, so my data is stored in a
seperate mount-point (/mnt/data), which some other data.
Local data should not be removed, because that is the source where other
homeserver will get it from.
stefan@rhuarc:~$ df -h /mnt/data
Filesystem Size Used Avail Use% Mounted on
/dev/sda 30G 23G 5.2G 82% /mnt/data
stefan@rhuarc:~$
Setting up synadm
It is easiest to use synadm for
some of the steps.
So I installed synadm as a virtual python environment in /opt/venvs/
and activated that environment to configure it.
stefan@rhuarc:~$ source /opt/venvs/synadm/bin/activate
(synadm) stefan@rhuarc:~$ synadm config
Running configurator...
Synapse admin user name [@<user>:<matrix domain>]:
Synapse admin user token [REDACTED]:
Synapse base URL [http://localhost:8008]:
Synapse Admin API path [/_synapse/admin]:
Matrix API path [/_matrix]:
Default output format (yaml, json, minified, human, pprint) [human]:
Default http timeout [7]: 30
Homeserver name ("auto-retrieval" or the domain part in your MXID) [auto-retrieval]: <matrix domain>
Verify certificate [True]: False
Server discovery mode (used with homeserver name auto-retrieval) (well-known, dns) [well-known]:
Restricting access to config file to user only.
(synadm) stefan@rhuarc:~$
This will store the configuration in ~/.config/synadm.yaml, so you can
change any settings in that file later if you want to.
Which rooms can be deleted?
After setting up synadm, we can create a list of room ID’s which have no local users, which a maximum of 500.
(synadm) stefan@rhuarc:~$ synadm -o json room list -s joined_local_members -r -l 500 | \
jq -r '.rooms[] | select(.joined_local_members == 0) | .room_id'
!XwMBSkZtuwzgbhJozx:tchncs.de
[cut multiple lines]
(synadm) stefan@rhuarc:~$
Because room ID’s do not really give any information on what the room is about, I created a new list which also shows the room name.
(synadm) stefan@rhuarc:~$ synadm -o json room list -s joined_local_members -r -l 500 | \
jq -r '.rooms[] | select(.joined_local_members == 0) | "\(.room_id) \(.name)"'
!XwMBSkZtuwzgbhJozx:tchncs.de null
[cut multiple lines]
(synadm) stefan@rhuarc:~$
After checking there are no rooms I really wanted to keep (although that would be strange when there is no local user in the room), I used the first list to delete the rooms via synadm. This will show details of the room which is going to be deleted. Sadly I was not able to find an option to skip the queston, so I had to confirm every deletion.
(synadm) stefan@rhuarc:~$ for room in $(synadm -o json room list -s joined_local_members -r -l 500 | \
jq -r '.rooms[] | select(.joined_local_members == 0) | .room_id'); do
synadm room delete "${room}";
done
room_id <room ID>
name
canonical_alias
joined_members 0
join_rules
guest_access
history_visibility
state_events 10
avatar
topic
room_type
joined_local_members 0
version 10
creator @heisenbridge:<matrix domain>
encryption
federatable True
public False
joined_local_devices 0
forgotten False
Total members in room: 0
Are you sure you want to delete this room? (y/N):
After deleting unused rooms, I also wanted to clear some diskspace. So I choose to remove remote media that was cached on the system and not access after 2025-01-01 .
(synadm) stefan@rhuarc:~$ synadm media purge -b 2025-01-01
deleted 0
(synadm) stefan@rhuarc:~$ df -h /mnt/data
Filesystem Size Used Avail Use% Mounted on
/dev/sda 30G 19G 9.3G 67% /mnt/data
(synadm) stefan@rhuarc:~$
Compressing the state in the database
To get a smaller database I use the rust-synapse-compress-state tool. I created a binary on a local Debian 12 VM, so I didn’t have to install all the development tools on the system that is running synapse.
After that I ran the tool with the same options as the example. I don’t really know the impact of using different values is.
(synadm) stefan@rhuarc:~$ synapse_auto_compressor -p postgresql://<user>:<password>@localhost/synapse -c 500 -n 100
[2025-04-25T20:21:31Z INFO synapse_auto_compressor] synapse_auto_compressor started
[2025-04-25T20:21:32Z INFO synapse_auto_compressor::manager] Running compressor on room !bAPwGxoHBivmFAyISG:<matrix domain> with chunk size 500
[cut multiple lines]
The first time I ran the tool, it took about 1,5 hours to complete. So what changed in the size of my database?
postgres@rhuarc:~$ psql -c "SELECT pg_size_pretty(pg_database_size('synapse'));" -t
15 GB
postgres@rhuarc:~$
Appearently nothing, but that is understandable, because PostgreSQL will re-use the space if possible when you make the data smaller, unless you vacuum it.
Database changes
Database reindexing
An other thing we can do, before vacuuming the database is reindexing the database.
postgres=# \c synapse
You are now connected to database "synapse" as user "postgres".
synapse=# REINDEX DATABASE CONCURRENTLY synapse;
WARNING: cannot reindex system catalogs concurrently, skipping all
REINDEX
synapse=# SELECT pg_size_pretty(pg_database_size('synapse'));
pg_size_pretty
----------------
14 GB
(1 row)
synapse=#
So the reindexing at least made the databsae a bit smaller :-)
Vacuuming the database
First I did a normal vacuum, because for this action the application does not need to be stopped, because it does not need to have full access to the database.
synapse=# VACUUM;
VACUUM
synapse=# SELECT pg_size_pretty(pg_database_size('synapse'));
pg_size_pretty
----------------
14 GB
(1 row)
synapse=#
That didn’t seem to help for the size of the database, so I did a full vacuum, but first I had to stop the application.
stefan@rhuarc:~$ sudo systemctl stop matrix-synapse.service
And the do the full vacuum.
synapse=# VACUUM FULL;
VACUUM
synapse=# SELECT pg_size_pretty(pg_database_size('synapse'));
pg_size_pretty
----------------
10 GB
(1 row)
synapse=#
So this helped to make the database smaller and I probably have to do this again when disk gets fuller.
And of course, I had to start the application again.
stefan@rhuarc:~$ sudo systemctl start matrix-synapse.service
create crontab for the synapse_auto_compressor
To keep the state information minimal during normal use, I created a script to run the synapse_auto_compressor tool every day. To make sure we don’t leak the credentials (via email), we create a very basic script to run the auto compresser.
#!/bin/bash
/home/stefan/bin/synapse_auto_compressor -p postgresql://<dbuser>:<dbpassword>@localhost/synapse -c 500 -n 100 >/dev/null 2>&1 || echo "synapse_auto_compressor failed"
The crontab entry:
30 2 * * * /home/stefan/bin/run_synapse_auto_compressor.sh
Conclusion
After doing multiple things, I was able to reduce the diskusage with about 4.1 GB + 5 GB = 9.1 GB. And depending how active the server will be it might be more or less the next time.