So it was pointed out to me, rightly, that as a tester I had better write some tests for my code. Makes sense.
So I started again and made the decision to do TDD on this project all the way, lead by example, right? Now I’ve never done TDD, I’ve seen it done and I’ve worked with people doing it, but personally it’s never come on the radar.
So I sat down to write a test for the simplest element of the project, or so I thought. I decided that I would make sure that when presented with a URL, I could check its status and add it to a dictionary of URLs and statuses which would form the end result.
Terrible, terrible idea. First thing is to write the tests, so I pass the function, let’s call it “status_checker” a url. But wait, to check the URL I’m going to need a server. Which is going to need html pages to serve and then I’m going to need to start the server up before I run each test and tear it down after the test. Which is… messy. It’s a lot of code for nearly no return and frankly if that breaks, it’s going to be immediately obvious. All my lurking doubts about TDD came to the fore and broke out in a wave of negativity. It’s slow, it’s hard and you end up writing a lot more code that you need to maintain than compared with just writing a program normally and making some decent tests afterwards.
But I knew that this was my native scepticism at work; I am always leery of new ways of doing things that already work.
So next thing, look for a part of the logic that’s easy and simple to write a test for, then work from there. So I wrote some tests to make sure that I didn’t request the same URL repeatedly. Not only does this limit our requests, it also stops us entering loops if two pages link to each other. Very important.
So I wrote some tests and then wrote the code to satisfy them and… it was completely unsatisfying and it made me write my code in a fudged up way, splitting out logic in ways I wasn’t happy with. The problem is easy to define and to me it defined the solution without me jumping through the hoops of TDD. I already knew how to write this bit and I knew it’d work. I spent more time working out how to get unittest to assert equality in tuples than I did writing the code that satisfied the changes! All this for what was a very simple function that I could write without too much thought and be sure it’d work as I expected.
I took myself off to the Testers slack (testers.io) and asked where I was going wrong. After some discussions I worked out a couple of things:
- I had already written the program, more or less, in my head. Writing tests for me wasn’t defining the problem, it was putting a step I didn’t feel I needed between me and working code. I was writing them to test the code that already existed in my mind.
- I didn’t spec out what was in my head fully so other people could appreciate my design decisions.
The first here was pretty simple, to me the problem was pretty well defined, I could see what I was going to have to do and how I could do it in gross logic. All that was left was the syntactic niceties and deciding on the subtle methods I’d use to write the program. I knew what libraries I’d use, I’d worked out some data structures that would work well for my problem. I knew what it was that was going to happen. I wasn’t letting the tests define the project.
Secondly I had many, many implicit ideas of how the program should run that I didn’t make clear. It was largely because to me they were blindingly obvious. Of course you’d want to limit recursion depth to a user defined level, wouldn’t you? You’d definitely want to thread it, because blocking on each request would slow it to a crawl, surely? Why would you wrap a library call in a function when it’s so computationally expensive? Basically I had, in my head, the unwritten requirement that the solution be scalable and fast. I had even decided that there should be threading too, to parallelise the requests and speed things up.
These were not necessarily bad things, but they were definitely a legacy of a more traditional mindset. I’ve always written code and then written tests later, so I know what it is that I am testing. As soon as I defined the problem in my mind, I was already writing the code in my head. So I was writing tests based on the code I had already written, I knew that one function was low hanging fruit, so I wrote some tests for it and then was upset that they added to the development time and didn’t do anything reading the code couldn’t.
This isn’t how it should work.
I need to throw away my ideas about the program and define the problem properly, then I need to write tests that describe the defined problem properly and then write code to make them pass.
However I am still unconvinced as to the value of this methodology, it will force me to wrap library calls in functions so as to make the logic that uses them into something easily testable. In Python this is potentially computationally expensive and I really dislike writing intentionally inefficient code. I write enough inefficient code by mistake without doing it on purpose.
Still I’ve started, so I should finish, and readability counts. I could really be thinking about this all wrong, in which case hopefully working through this project will make it click better in my head. If nothing else I’ll have a lovely test suite for my link scanner, even if it won’t be written quite how I had initially envisaged it.