A few weeks ago during the development of one of my on-chain Solana games, I was chatting to a fellow Solana dev about my progress and how I’d spent the last few days writing unit tests. This dev casually mentioned that he finds Solana account management too complex, so just lets an LLM write all his unit tests for him.
As a security-conscious smart contract dev, this blew my mind. Anyone in this field who knows what they are doing will tell you that writing contracts and unit testing them are an inseparable and highly consequential tasks. If you make a mistake in your frontend code, a user has a bad experience and you patch it. If you make a mistake in your smart contract code, funds get locked or drained and you can never get them back.
However, this dev’s experience speaks to a larger issue with Solana dev in general, internally known as chewing glass, which is that it can be somewhat hostile to devs, and difficult to get into. I lamented this fact many times back when I was learning Solana for the first time, and there are still many landmines that can trip you up when it’s least convenient.
So rather than just exasperatedly demanding that others lift their game, I decided to make life easier for those who need it. This is why I’ve published the SLAM Test Framework, which makes writing unit tests for Solana programs easier for people who currently find it hard. The name comes from the stack, Solana, LiteSVM, Anchor, Mocha. It’s something I developed for my own personal use several years ago and it has served me very well in that time, so now I’m making it public.
Note, in this article I will be using the term smart contract even though within Solana we often just call them programs. This is just for clarity. Potato patato.
Why Using an LLM to for Unit Tests is a Bad Idea
Before diving into the framework it’s important to explain why relying on an LLM to write all your smart contract unit tests is such a bad idea. It comes down to thoroughness, and catching all edge cases. A poorly tested smart contract is in some ways worse than a completely untested smart contract, because you have an unearned confidence in the quality and security of that program.
If you live in a rough neighborhood, and from a cyber-crime POV crypto is a very rough neighborhood, you will act very differently in your house at night if you know you locked all the doors and windows, vs if you didn’t bother to check yet. Having a poorly tested smart contract that you think you tested well is like leaving your wallet and phone next to the front door and walking around naked, thinking the door is bolted but actually having no idea.
Due to how LLMs work, LLM code generation is fairly broad and generic. It’s why they have seen such success with projects that just need to throw something up and worry about the details later. They are renowned tech-debt and comprehension-debt machines, but the speed and cost makes this an acceptable trade off for many people building many things.
Unfortunately, they are not the precise or intelligent thinking machines that some people think. Wiring together different processes and creating feedback loops can decrease the rate of errors produced by LLMs and give a greater appearance of thinking to the outside observer, but when it boils down to it they are still just predicting tokens based on their model and what they already have in context. This means that the closer you get to edge cases, or unique behavior, the less reliable they are.
Smart contract unit tests are basically all edge cases.
Yes you will inevitably write tests to prove the core functionality is working as expected, but the meaty part of the work is making sure that your smart contract does what it should when things get weird and crazy. If your contract is so beige that you don’t need to write new and unique test cases then you probably didn’t need a new contract and could have used an existing one.
There are three ways in which poorly written unit tests can go wrong: false negatives, false positives, and omissions (implicit false positives). Generally, with LLM written unit tests, it is the latter that brings the most risk. I will explain the three of them here, continuing our locked house analogy:
False Negatives
A false negative, in a unit testing context, means you have a test that identified a problem that wasn’t actually there. Negative means the test fails, false negative means it actually shouldn’t have failed. So if you try to rectify your smart contract to patch this issue, you will actually be creating a problem.
In our house analogy, suppose to lock our front door we turn the key left. To unlock it we turn the key right.
A false negative would say “the key is turned left, therefore the door is unlocked”. This is wrong, but trusting your tests you would turn he key right thinking the door was now locked, when in actual fact you had unlocked it.
I recently launched an on-chain game, and despite writing all my own tests, decided to be thorough and a Claude-addict friend put the program through his LLM in case there was something I missed. Without going into technical specifics, it misunderstood a certain property whose use would be dictated by external (real world) behaviours not implicit just from the code. The LLM failed to understand the purpose of the flag and suggested inverting its behaviour (which if I had implemented would have created race conditions and vulnerabilities that locked certain functionality and funds).
Thankfully I had written my own tests, and so I didn’t take its advice.
False Positives
A false positive is when there is something wrong with your smart contract, but the tests fail to recognise it.
To revisit the locking door from before:
In our house analogy, suppose to lock our front door we turn the key left. To unlock it we turn the key right.
A false positive would say “the key is turned right, therefore the door is locked”. This gives us all the confidence of a locked front door, with none of the security.
Had I changed the code in the previous example, the LLM-written test would have given me a false positive assuring me that the new (incorrect) use of the property was actually the correct use.
Omissions (Implicit False Positives)
The most serious and likely issues you will have with LLM-written tests are those caused by omission. This is when the LLM fails to create any test at all for a specific edge case or bug. This happens because, as mentioned earlier, the LLM is not thinking about the grander design or scope of the project, or how it will interact with the real world. In fact, it’s not even capable of this. It’s just predicting tokens to generate a file that looks like unit tests.
Implicit false positives mean you assume everything is fine, because despite having unit tests, you don’t have any tests saying there’s a problem. You assume the tests were thorough and complete, but there are issues you are completely unaware of.
In our house analogy once more, an implicit false positive would be if you asked someone whether they locked all the doors and windows, they said “yes”. But they were completely unaware of entire rooms at the back of your house that have doors and windows, all of which may still be unlocked.
The LLM-written tests for my recent game had this issue in spades. There were entire behaviour patterns and edge cases completely neglected by the tests. Going live with a smart contract with that level of test incompleteness is an excellent way to court disaster.
LLM Swiss Cheese
If you’re a vibe coder you may be yelling at the screen saying “but if I throw enough different models at it, it will catch all the bugs”. This approach predates vibe coding, and is not even specific to unit tests. The Swiss cheese model is a risk management strategy that means rather than trying to catch every possible issue, you put various safeguards in place whose gaps in effectiveness will most likely not line up.
If you stack several pieces of Swiss cheese next to each other, it’s unlikely that something can pass through in a straight line without eventually hitting cheese.

This model is used across many industries, and including a Swiss cheese approach in your unit tests is never going to be a bad idea. In fact, asking my friend to test my contracts with Claude would be considered Swiss cheesing.
But relying entirely on this, completely unaware of where those gaps are, or what your tests are doing, is a recipe for disaster whose risk grows with the amount of funds your smart contract will be handling.
Would you feel confident with LLM written tests if $10,000 was on the line? How about $100k or $1 million? Having built projects whose smart contracts handled user funds in excess of this, if I didn’t know with certainty that these user funds were secure there is no way I could have slept calmly at night.
Code Coverage
Some of you may also be saying “but as long as I tell my LLM to ensure 100% code coverage, it will be fine”. This is incorrect.
While having code coverage below 100% is generally a sign your tests are incomplete, having it at 100% does not automatically mean that you have caught all behaviors and edge cases. You can touch every line of code without fully exploring all the things those lines can do.
Unless you fully understand what your code is supposed to do, how it exists in a real world context, and all the ways it can go wrong, your tests are just a stress relief placebo.
Humans Are Fallible Too
Of course, this is not all to say that just because a human wrote your tests that they are flawless. Far from it. There are plenty of incompetent devs out there, and there are plenty of competent devs who make mistakes. I am in one of those two categories for sure, but I couldn’t say which.
A few years ago I wrote about a bug I found and patched in OpenZeppelin’s public ERC-721 implementation that their tests had all missed and that had cost users about $40 million USD in wasted gas. A false positive was making every transaction include additional, unneeded operations that were burning users money every time they used an NFT.
So the point is not that mistakes can’t be made by people, the point is that unit tests must be written by someone who can fully understand the problem the smart contract is trying to solve. You essentially need to try to break your own design in the most creative ways possible. An LLM isn’t a thinking machine and it can’t conceptualise anything, it can’t do the out-of-the-box problem solving or get in the head of potential users to try and exploit their behaviour. Maybe one day we will have an AI that can do this, but for now we just have the token prediction machine.
The Framework
Now that we’ve covered the why, let’s get to the what. The framework is built to work within Anchor, using Mocha (JS) for tests. It uses LiteSVM in stead of Solana Test Validator or the newer Bankrun. The former lacks functionality, and the latter is deprecated.
Why JS tests?
There are two main ways to write unit tests for Solana programs: Rust and JS. Some say that Rust is better because its a much more precise language, so why would you unit test your Rust smart contract code a less precise language?
This is a valid opinion, and if you are already writing your own tests in Rust then odds are this entire framework isn’t for you since you probably know what you’re doing.
But I find JS tests advantageous for two reasons:
- Personal familiarity with JS
I like many devs have been writing JS in one form or another for my entire professional life. Given that I got my first coding job in high school, this means about 20 years for me. Yes, I am old. So writing JS comes very naturally to me, and I can have a greater degree of confidence in the quality of my tests, can write more automatically, and can focus more on making sure the program code is correct.
2. Frontend (and maybe backend) code will be in JS
If you are a full-stack dev and not just responsible for writing the contracts and testing them, then odds are you will have work further along the stack that requires you to implement your contracts in some way. Either in a frontend for a dapp, or in some backend service. This means that you will more than likely need to sling some JS that interacts with your contracts anyway, so if you can get a head start on that and write some re-usable code during tests, you may have saved a day or two of work.
As someone who has solo-deved many projects in this space, the time savings in this regard are not insignificant. If you later need to make contract-level changes (and thus modify your tests), it lowers the extra work required to bring the rest of the stack in line.
Functionality
If you want to get straight to the biscuits, the full documentation is available on both GitHub and NPM. It also includes an example Anchor project that implements the framework for tests.
Rather than re-hashing everything in the readme, I’ll just mention the major functionality.
- Environment management: easy management of LiteSVM environment, signers and accounts.
- Program helpers: clear and simple program helpers that make programs easier to interact with
- Tx failure and success: specific functions designed to aid in testing specific transaction failures.
Example
I’ve taken some code from the example tests and bundled them into a single test so you can see some of the syntax. The example repo has the smart contract code and full tests, so for proper context go there.
it("Test some stuff", async()=>{
// Re-initialise the client object
const client = createNewClient();
// Init the SLAM environment
initEnvironment(client, signers );
// Initalise the program helper
program = createProgram<SlamExample>(IDL);
// Set current signer to signer0
setSigner(signer0);
//Clear any added accouts from previous tests
clearAddedAccounts();
// Assert that tx succeeds
await succeeds(async()=>{
await program.createMyAccount();
})
// Try again and fail
await fails(async()=>{
await program.createMyAccount();
})
// Some vars that will be used in the next tx
const newBool = true;
const newPubKey = signer1.publicKey;
const newU64 = RBInt(0,(2n**64n) - 1n);
// Get PDA address
account = program.pda(["me", getSigner().publicKey]);
// Add it to the accounts provided to txs
addAccounts({
accountToUpdate: account,
})
// Bool value that will cause it to fail
const incorrectBool = false;
// failsCorrectly will cause the test to fail unless the
// tx fails with this specific program error.
await failsCorrectly(async()=>{
await program.updateAccount(incorrectBool, newPubKey, newU64);
},"Bool Must Not Match");
//Succeesfully update account
await succeeds(async()=>{
await program.updateAccount(newBool, newPubKey, newU64);
})
});Solana SLAM
It’s difficult to say for sure, but I suspect readers will be split into two categories: they lost interest and stopped reading long ago or they were interested enough to view the project proper and are no longer reading. If you are in the latter camp, I hope you get some benefit from this framework. It’s something I will continue to use for my own use, but will build upon it if there is any demand from the dev community.
I wish you many green ticks next to your tests, and few rewrites.
If you are looking for a 10 year veteran smart contract dev with full stack Solana and Ethereum skills who writes contracts that are properly tested, you can contact me on:
Solana SLAM: A Framework to Make Unit Tests Easier was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.