GitHub Copilot

Useful assistant or money down the drain?

13. 09. 2024 17:30

Is GitHub Copilot a useful assistant or rather a waste of money and a hindrance to work? My personal experience giving it a try.

First of all. I am quite critical of these systems. So you should take this article accordingly. There are certainly people who have a much more optimistic and risk friendly view than I do.

After several months playing around with GitHub Copilot my opinion is ... mixed. Since the price of Copilot is relatively low and it is easy to integrate this is not necessarily a problem. However, there are other solutions and if you already use them you might not gain much by adding Copilot to the mix. In theory you can even use local LLMs but these usually have the disadvantage that you can only run models up to a certain size because even a 4090 folds at just over 20GB and the models that are significantly smaller often have substantially more downsides and are therefore only of very limited use.

What Copilot can do quite well once it has digested the code is autocomplete on steroids. You just have to be careful not to start waiting for the system and I've caught myself doing that several times. It's a risk if you are that kind of person. In most languages and well known frameworks, creating skeletons is not a major problem either. However, the usual tools are in almost all cases much better at this and also faster.

With these systems, however, the question is probably more how the thing can code on its own. Can it replace me or at least downgrade me to supervising it? And the answer to that is a very loud No ... unless you want to go crazy doing that.

These systems can support experienced developers quite well when it comes to rather tedious tasks. It depends on what you are currently doing though. For beginners however these systems are extremely dangerous because they screw up a lot and you need to see that.

LLMs are not continuously updated. This is true for all of these systems. Copilot, ChatGPT and all the others. Copilot itself also uses several systems from other providers. ChatGPT being one of them. These models are trained and then used. You can make small adjustments here but only to a very limited extent. This means that the data cutoff date (the date from which no information is available) is usually months in the past. ChatGPT-4o for example is from May 24 but the knowledge cutoff date is October 23. GPT-4 Turbo is April 23 and GPT-4 is from September 21. Any potential updates to the older systems are not taken into account here.

This results in a couple of rather obvious problems that are particularly relevant when it comes to coding. If you give these systems more freedom - especially as a beginner - you very quickly run into the risk of using outdated libraries that should either no longer be used at all or that have known security vulnerabilities.

A few weeks ago I was presented with code that immediately showed that it came out of AI. Among other things the 19 security updates and the bizarre selection of versions that made no sense at all. These bizarre decisions, which cannot really be explained, result from the fact that these systems exist in the past. They do not know that 1.x was phased out a few month ago and that 2.0.0 had a really nasty security issue and you really want to use 2.0.1 instead. It's simply (yet) unknown to the AI and that might be the case for months to come.

And this is a huge risk for beginners who don't have this on their radar or people who don't know how these systems work. With outdated code comes not only frequent security problems but also very basic things like “nobody's using this shit anymore”. This is particularly a problem with languages and frameworks that (still) have a high frequency. Many Rust and Golang libraries for example. Things like PHP and Symfony or Laravel are less of a problem because they move quite slowly. But you have to keep your eyes wide open for version bumps of these frameworks.

The other big problem is wrong code that works. And that is a particularly nasty problem. I did a test with Dijkstra some time ago. An algorithm to find the shortest path between nodes in a system. I let Copilot do the entire thing in several languages and then had a look at the results.

In my tests, it was only correct in PHP, which surprised me because PHP is not often at the top of the list with this. In C and Rust the compiler refused to work and in Golang there was an obvious problem in the code. But Python - of all things and certainly least expected - produced code that worked and looked pretty correct ... at first glance. But it wasn't. And that is extremely dangerous because it produced code that ran without any problems ... but not correctly. The error was so subtle that I had to look at it carefully to spot it.

Letting Copilot actually write code on its own is a risk that should not be taken and is often not productive anyway. In many cases you have to fix too much shit to get it into a proper state. You also might need to communication way too much with the system to push it into a direction where you want it to be. In the meantime you could have simply done it yourself.

You also shouldn't be blinded by boatloads of stupid examples on social media. If I need an application that already exists as a tutorial for every single language (e.g. a shopping list) and have no requirements for the implementation ... yes. It can do that and it can do that somewhat well. However you should be realistic here because this very specific requirement is fairly rare in a real world situation. Apart from that there are regularly the aforementioned problems if you take a closer look. The same applies to the usual suspects in job interviews. This is generally not very creative and you see the same tasks in a slightly different form again and again. These systems are usually well trained to do them. This is also often not what you are actually doing right now, is it?

My conclusion about Copilot - and many other systems like it - is that they very often work well as autocomplete on steroids if you don't start letting the system dictate the way you work. That in itself can justify their use. For everything else you probably should avoid using them. They are not even Stack Overflow, which at least has the advantage that there might be someone who noticed that the code is actually bullshit or hopelessly outdated.