From 9c1d743bfd697c996e350130508443d9c67b41cd Mon Sep 17 00:00:00 2001
From: jazzpi <jasper@mezzo.de>
Date: Wed, 3 Aug 2022 00:52:28 +0200
Subject: [PATCH] Explain clock synchronization mechanism

---
 Core/Src/ClockSync.c | 111 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 111 insertions(+)

diff --git a/Core/Src/ClockSync.c b/Core/Src/ClockSync.c
index 53d5af6..7a2c227 100644
--- a/Core/Src/ClockSync.c
+++ b/Core/Src/ClockSync.c
@@ -1,5 +1,116 @@
 #include "ClockSync.h"
 
+/**
+ * @file ClockSync.c
+ * @author Jasper v. Blanckenburg (j.blanckenburg@fasttube.de)
+ * @brief Clock synchronization mechanism -- slave side
+ * @version 0.1
+ * @date 2022-08-02
+ *
+ * @copyright Copyright (c) 2022
+ *
+ * OVERVIEW
+ * =========
+ * The slaves use the STM's internal clock (HSI), which is -- especially at
+ * higher temperatures -- quite inaccurate (±4% over the STM's working
+ * temperature range, according to the datasheet).
+ *
+ * Since the CAN bitrate is directly determined from the HSI (through prescaling
+ * & time quanta), an inaccurate HSI means an inaccurate CAN bitrate. Especially
+ * once the battery heats up, this leads to packet loss and ultimately the CAN
+ * transceiver entering Bus-Off due to too many transmission errors.
+ *
+ * The easy fix would be to use an external clock (HSE), i.e. a quartz crystal.
+ * Although a crystal is present on the slaves, it does not work on every one
+ * and is mounted on the (inaccessible) underside. Thus, we need to make do with
+ * the HSI.
+ *
+ * Fortunately, the HSI frequency can be trimmed through the HSITRIM bits in the
+ * RCC_CR register (see STM32F412 reference manual, section 6.2.2; as well as
+ * STM AN5067).
+ *
+ * The HSITRIM register provides the mechanism for manipulating the HSI
+ * frequency, however we still need to determine what value to manipulate it to.
+ * Since we don't really care about the absolute accuracy of the slaves' clocks,
+ * but rather their relative accuracy to the other nodes on the CAN bus
+ * (especially the master), we can synchronize the clocks to one another via
+ * timed CAN frames.
+ *
+ * TIMED CAN FRAMES
+ * ================
+ * As the master is the least affected by the battery heating up, and also had
+ * the most accurate clock during testing, we use it to generate the timed
+ * frames.
+ *
+ * It sends frames from timer interrupts and with high priority (low ID) to
+ * ensure minimal deviation from their intended frequency. It sends two separate
+ * kinds of frames: CLOCK_SYNC and MASTER_HEARTBEAT.
+ *
+ * The MASTER_HEARTBEAT frames are sent every 100 ms. Their purpose is simply to
+ * reliably have messages on the bus, so that the slaves can tell whether they
+ * are roughly in sync with the master by checking for their reception (see the
+ * FREQUENCY HOPPING section).
+ *
+ * The CLOCK_SYNC frames are sent every 1000 ms. They serve as the external
+ * clock source. The slaves continually trim their HSI according to the time
+ * they measure between to CLOCK_SYNC frames (see the NORMAL OPERATION section).
+ *
+ * FREQUENCY HOPPING
+ * =================
+ * If the HSI is very out of sync with the master's clock (e.g. because the AMS
+ * was restarted with a warm battery), the slaves don't receive any CAN packets
+ * from the master and thus can't rely on the CLOCK_SYNC frames for
+ * synchronization. In this case, they rely on what is essentially frequency
+ * hopping.
+ *
+ * The frequency hopping mechanism has two stages: One to get in the right
+ * ballpark, and one to make the communication reliable enough for normal
+ * operation.
+ *
+ * STAGE 1
+ * -------
+ * Stage 1 trims the HSI until at least one MASTER_HEARTBEAT frame has been
+ * received. The frequency alternates between lower and higher values, i.e. if
+ * the trim was initially 16, it will go through the following values:
+ *
+ * 16 -> 14 -> 18 -> 12 -> 20 -> 10 -> 22 -> ...
+ *
+ * Once a MASTER_HEARTBEAT frame has been received, the slave transitions to
+ * stage 2.
+ *
+ * STAGE 2
+ * -------
+ * Stage 2 trims the HSI further until at least three consecutive
+ * MASTER_HEARTBEAT frames have been received. The frequency alternates in the
+ * same fashion as in stage 1, but now around the frequency where a
+ * MASTER_HEARTBEAT frame was received in stage 1, and more slowly.
+ *
+ * Once three consecutive MASTER_HEARTBEAT frames have been received, the slave
+ * transitions to normal operation.
+ *
+ * NORMAL OPERATION
+ * ================
+ * During normal operation, the HSI is continually trimmed so that CLOCK_SYNC
+ * frames are received every 1000 ms. Since the slave measures time in
+ * milliseconds (via the HAL_GetTick() function), this allows a measurement
+ * accuracy of 0.1%. Each increment of HSITRIM should account for an approximate
+ * 0.3% increase in the clock frequency, according to AN5067, so the 0.1%
+ * accuracy is more than enough for accurate trimming.
+ *
+ * By counting the ticks between two CLOCK_SYNC frames, the slave determines its
+ * actual HSI frequency (assuming the master clock is accurate):
+ *
+ * f_real = 16 MHz * measured_ticks / 1000
+ *
+ * If the real frequency differs from the target frequency (16 MHz) by more than
+ * the trim delta, the trim is incremented or decremented accordingly.
+ *
+ * The trim delta is determined dynamically: It is the difference between the
+ * real frequency before and after each trim.
+ *
+ * If the slave misses two consecutive CLOCK_SYNC frames for whatever reason, it
+ * returns to (stage 1) frequency hopping.
+ */
 #include "AMS_CAN.h"
 
 #include "stm32f412rx.h"