Why my ESP32 firmware speaks Lua
(and why I did not write my own little language)
The wood boiler firmware does not just read sensors. It decides when to escalate, when to send alerts, when to back off and wait. Those decisions are the part most likely to change. The sensor wiring is settled. The MQTT plumbing is settled. The rules keep moving and can change from time to time.
At the beginning of the project the rules lived in C++ like everything else. Every time I wanted a different threshold, a different hysteresis, a different combination of conditions for the device, I rebuilt the firmware, pushed it as an OTA, and waited for the device to fetch and reboot. That is fine for a one-off. It is a trap if every deployment needs its own decision tree, and just not what I wanted.
The honest options were:
- Roll my own little language. A purpose-built rule format, parsed by code I would write. Very steep learning curve. And also a months-long side quest I do not need.
- YAML config plus a hard-coded evaluator. Similar to ESPHome’s approach. Works well for the cases the evaluator covers. Falls apart the moment you want a condition the evaluator can’t handle. Just not as flexible as I had it in my mind.
- An embedded scripting language. Drop a real interpreter into the firmware, expose the things the device can already do, let scripts decide. Kind of the approach I have as a systems specialist in front of me almost daily.
Option three is where this ends up. The question was which language.
I picked Lua. Three reasons:
- It is designed to be embedded. The whole point of Lua, since it was designed in 1993, is to live inside someone else’s C program. It is not fighting to be the firmware. It is fitting under it.
- The footprint fits within the ESP32 - Lua 5.3 compiles to roughly 150 KB. The interpreter, the standard library, the bindings I wrote on top, all of it lands in flash without crowding out the parts that actually need to be there.
- Lua leaves the door open for Blockly. Down the road I would like somebody who is not a programmer to be able to wire up rules visually. Blockly already has a Lua code generator. The block view emits Lua. The firmware runs Lua. The seam is already cut.
I considered some alternatives. Berry, from Tasmota, is smaller and lovely. The Blockly path would need a custom generator though, and Tasmota is the only big production user. MicroPython would mean replacing the firmware, not embedding in it. There were some more, but all the picks had some trade-offs. Lua had the fewest for what I wanted.
The rest of this post is what it looks like once the interpreter is in there - what the API exposes, what a real rule looks like, and what I had to do to make hot-reload not blow up in my face (and saved me some field trips).
Where the scripts live
Two files on the device’s flash:
/scripts/main.lua - runs at boot, and again on every reload
/scripts/rules.lua - the one I actually iterate on
main.lua runs every time the Lua state comes up. It is for the setup the script needs each time the interpreter starts fresh: registering EventBus subscriptions, MQTT topic handlers, helper functions. Stuff I rarely edit. rules.lua is the policy I actually change - thresholds, conditions, the response logic. The split is editorial, not technical. Either file can pull in more via dofile("/scripts/other.lua") if two is not enough.
Editing happens through the device’s web UI if I am on the same network, over the serial port if I am sitting in front of it, or through the MQTT shell from the previous post if I am anywhere else. Whichever path I take, the file lands on LittleFS and the device gets a reload trigger:
<prefix>/cmd/lua/reload
The device is already subscribed to that topic at boot. Any payload at all flips the switch. The interpreter tears down its state, opens a fresh one, re-registers every native binding, re-executes both scripts, and is back online inside a few hundred milliseconds. No reboot, no OTA, no driving anywhere, no field-flashing a microcontroller in the snow at -20 with freezing fingers.
There is one detail that took me an evening to get right and that I will come back to: anything rules.lua subscribes to on the EventBus or on MQTT during one run has to be invisible to the next run, or the device ends up firing old logic against new state. The fix is a generation counter. More on that in the gotchas.
What Lua can do inside the firmware
The interpreter is sandboxed - no file system access beyond LittleFS, no network sockets, no shell-out. What it does have is a set of bindings into the parts of the firmware that already exist:
| Module | What it gives the script |
|---|---|
Log | Log.info(msg), Log.warn(msg), Log.error(msg) |
MQTT | MQTT.publish(topic, payload), MQTT.subscribe(topic, fn) |
EventBus | EventBus.subscribe(event, fn) - the in-process pub/sub |
Config | Config.get("a.b.c") - dot-notation read from device config |
Node | Node.restart(), Node.uptime(), Node.ip(), Node.setTimeout(ms, fn) |
JSON | JSON.decode(str) - parse to Lua table |
That table is what Lua sees out of the box. The interesting part is that other modules can register more bindings of their own. The Telegram module exposes a Telegram.broadcast(text) to send to every configured chat, plus Telegram.send(chat_id, text) for a specific one. The GNSS module exposes the last fix as a table. The shape of the API depends on which modules are compiled into the build. I will come back to how that wiring works.
In practice almost every rule uses three things: subscribe to an event, decide whether to act, publish or alert. So most of rules.lua ends up looking like this:
EventBus.subscribe("sensor.temperature", function(data)
if data.value > Config.get("alerts.overtemp_threshold") then
Log.warn("overtemp: " .. data.value)
Telegram.broadcast("Boiler overtemp: " .. data.value .. " C")
end
end)
Five lines, three bindings, one real alert. The script never has to know that Telegram.broadcast ends up in an HTTPS POST, or that Config.get walks a JSON tree, or that the data table came from a C++ event marshalled to Lua. The firmware does the work. The script makes the decisions.
Modules bring their own bindings
Log, MQTT, EventBus, Config, Node, and JSON are the core surface. They are always there because they are always there in the firmware. The interesting bit is how the optional modules - Telegram, GNSS, the various sensor drivers - extend the Lua API without the script engine knowing anything about them.
Every module registers its bindings at boot. The signature looks like this:
ScriptEngine::addBindings([](lua_State* L) {
static const luaL_Reg telegramLib[] = {
{"send", lua_telegram_send},
{"broadcast", lua_telegram_broadcast},
{nullptr, nullptr}
};
luaL_newlib(L, telegramLib);
lua_setglobal(L, "Telegram");
});
The script engine keeps a small static list of these registrar functions. On boot, after the Lua state is created and the core bindings are loaded, every registrar gets called in turn. On hot reload, the list survives because it lives outside the Lua state. The new state gets the same set of bindings, automatically.
The upshot is that the API a rules.lua script sees is exactly the set of features compiled into that particular build. A firmware with the Telegram module compiled in has Telegram.*. A firmware without it does not. The script does not have to guess - if Telegram is nil, the feature is not present in this build, and the script can decide whether to fall back or just fail loud.
This is the part I like the most. The Lua API and the C++ module set are the same surface, exposed twice. The compile-time choice of “what is in this build” is also the run-time choice of “what can scripts see.” No drift, no separate registry, no extra config file mapping one to the other.
Gotchas, the ones that took an evening each
Stale callbacks after reload
The naive version of hot reload tears down the Lua state and builds a new one. What it does not tear down is the C++ side of every EventBus.subscribe and MQTT.subscribe the old script made. Those subscriptions live on the firmware side of the fence. They still fire. And they still hold references into a Lua state that no longer exists.
The first time I hit this, the device kept publishing alerts from a rule I had just deleted. Reloaded the script, alerts kept coming. Reloaded again, two copies of the alert. The old callbacks were firing in addition to the new ones.
The fix is a generation counter. Think of it as a version number that goes up by one every time the script is reloaded. Every subscription the script makes gets stamped with the version that was current at the time. When an event fires, the firmware checks the stamp on the subscription against the current version. If the stamps do not match, the subscription is from an old script and gets ignored. The new script’s subscriptions, stamped with the new version, are the only ones that run.
Nothing has to be unwired by hand. Old subscriptions just stop having any effect, and they fall away on their own as the script keeps running.
Parsing scripts without blowing the heap
The other one is more boring but just as important. Lua’s parser, by default, wants the whole script handed to it in one piece. That means finding a single free slab of RAM big enough to hold the file. On the older ESP32-WROOM (the classic chip, no extra RAM on board) with a heap that is already busy, that slab is exactly the thing that is not available. The device runs out of memory partway through parsing a perfectly fine script. On the newer ESP32-S3 with its extra PSRAM, the same script loads without trouble.
The fix is to feed the parser the file in small pieces. Lua supports this directly - hand it a reader function and it will ask for bytes as it needs them, in whatever size chunk you want to give it. A 256-byte buffer is enough. The script can be any size, and the parser never sees more than a sliver at a time.
Same fix every embedded Lua project ends up writing. Worth mentioning because the alternative is a confusing “everything works on the S3, nothing works on the WROOM” bug that does not look like a Lua problem at all.
What is next
The script engine has been quietly running on the production wood boiler for weeks. Rules get edited from the couch. Alerts get tuned without rebuilding. The firmware does what it has always done. The decisions on top of it now live in a file I can edit from anywhere.
The next piece I would like to land is the block-based editor. Pick blocks from a palette, drop them on a canvas, see a rule emerge, push it to the device. Blockly emits Lua. The device runs Lua. That bridge is short. I will write that one up when it works.