“No Duh,” say senior developers everywhere.

The article explains that vibe code often is close, but not quite, functional, requiring developers to go in and find where the problems are - resulting in a net slowdown of development rather than productivity gains.

  • Baguette@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    13
    ·
    24 days ago

    I’d be inclined to try using it if it was smart enough to write my unit tests properly, but it’s great at double inserting the same mock and have 0 working unit tests.

    I might try using it to generate some javadoc though… then when my org inevitably starts polling how much ai I use I won’t be in the gutter lol

    • sugar_in_your_tea@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      18
      ·
      24 days ago

      I personally think unit tests are the worst application of AI. Tests are there to ensure the code is correct, so ideally the dev would write the tests to verify that the AI-generated code is correct.

      I personally don’t use AI to write code, since writing code is the easiest and quickest part of my job. I instead use it to generate examples of using a new library, give me comparisons of different options, etc, and then I write the code after that. Basically, I use it as a replacement for a search engine/blog posts.

      • MangoCats@feddit.it
        link
        fedilink
        English
        arrow-up
        3
        ·
        24 days ago

        Ideally, there are requirements before anything, and some TDD types argue that the tests should come before the code as well.

        Ideally, the customer is well represented during requirements development - ideally, not by the code developer.

        Ideally, the code developer is not the same person that develops the unit tests.

        Ideally, someone other than the test developer reviews the tests to assure that the tests do in-fact provide requirements coverage.

        Ideally, the modules that come together to make the system function have similarly tight requirements and unit-tests and reviews, and the whole thing runs CI/CD to notify developers of any regressions/bugs within minutes of code check in.

        In reality, some portion of that process (often, most of it) is short-cut for one or many reasons. Replacing the missing bits with AI is better than not having them at all.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          7
          ·
          24 days ago

          Ideally, the code developer is not the same person that develops the unit tests.

          Why? The developer is exactly the person I want writing the tests.

          There should also be integration tests written by a separate QA, but unit tests should 100% be the responsibility of the dev making the change.

          Replacing the missing bits with AI is better than not having them at all.

          I disagree. A bad test is worse than no test, because it gives you a false sense of security. I can identify missing tests with coverage reports, I can’t easily identify bad tests. If I’m working in a codebase with poor coverage, I’ll be extra careful to check for any downstream impacts of my change because I know the test suite won’t help me. If I’m working in a codebase with poor tests but high coverage, I may assume a test pass indicates that I didn’t break anything else.

          If a company is going to rely heavily on AI for codegen, I’d expect tests to be manually written and have very high test coverage.

          • Nalivai@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            24 days ago

            Why? The developer is exactly the person I want writing the tests.

            It’s better if it’s a different developer, so they don’t know the nuances of your implementation and test functionality only, avoids some mistakes. You’re correct on all the other points.

            • sugar_in_your_tea@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              2
              ·
              24 days ago

              I really disagree here. If someone else is writing your unit tests, that means one of the following is true:

              • the tests are written after the code is merged - there will be gaps, and the second dev will be lazy in writing those tests
              • the tests are written before the code is worked on (TDD) - everything would take twice as long because each dev essentially needs to write the code again, and there’s no way you’re going to consistently cover everything the first time

              Devs should write their tests, and reviewers should ensure the tests do a good job covering the logic. At the end of the day, the dev is responsible for the correctness of their code, so this makes the most sense to me.

              • Nalivai@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                24 days ago

                the tests are written after the code is merged - there will be gaps, and the second dev will be lazy in writing those tests

                I don’t really see how this follows. Why do the second one necessary have to be lazy, and what stops the first one from being lazy as well.
                The reason I like it to be different people is so there are two sets of eyes looking at the same problem without the need for doing a job twice. If you miss something while implementing, it’s easier for you to miss it during test writing. It’s very hard to switch to testing the concept and not the specific implementation, but if you weren’t the one implementing it, you’re not “married” to the code and it’s easier for you to spot the gaps.

                • sugar_in_your_tea@sh.itjust.works
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  23 days ago

                  Devs are more invested in code they wrote themselves. When I’m writing tests for something I didn’t write, I’m less personally invested in it. Looking at PRs by other devs when we do pushes for improving coverage, I’m not alone here. That’s just human psychology, you care more about things you built than things you didn’t.

                  I think testing should be an integral part of the dev process. I don’t think any code should be merged until there are tests proving its correctness. Having someone else write the tests encourages handing tests to jr devs since they’re “lower priority.”

                  • Nalivai@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    5 days ago

                    Devs are more invested in code they wrote themselves. When I’m writing tests for something I didn’t write, I’m less personally invested in it.

                    This, I think, is a very bad part of the problem and shouldn’t be happening regardless

            • MangoCats@feddit.it
              link
              fedilink
              English
              arrow-up
              1
              ·
              24 days ago

              I’m mixed on unit tests - there are some things the developer will know (white box) about edge cases etc. that others likely wouldn’t, and they should definitely have input on those tests. On the other hand, independence of review is a very important aspect of “harnessing the power of the team.” If you’ve got one guy who gathers the requirements, implements the code, writes the tests, and declares the requirements fulfilled, that better be one outstandingly brilliant guy with all the time on his hands he needs to do the jobs right. If you’re trying to leverage the talents of 20 people to make a better product, having them all be solo-virtuoso actors working independently alongside each other is more likely to create conflict, chaos, duplication, and massive holes of missed opportunities and unforeseen problems in the project.

              • Nalivai@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                24 days ago

                independence of review is a very important aspect of “harnessing the power of the team.”

                Yep, that’s basically my rationale

          • MangoCats@feddit.it
            link
            fedilink
            English
            arrow-up
            2
            ·
            24 days ago

            but unit tests should 100% be the responsibility of the dev making the change.

            True enough

            A bad test is worse than no test

            Also agree, if your org has trimmed to the point that you’re just making tests to say you have tests, with no review as to their efficacy, they will be getting what they deserve soon enough.

            If a company is going to rely heavily on AI for anything I’d expect a significant traditional human employee backstop to the AI until it has a track record. Not “buckle up, we’re gonna try somethin’” track record, more like two or three full business cycles before starting to divest of the human capital that built the business to where it is today. Though, if your business is on the ropes and likely to tank anyway… why not try something new?

            Was a story about IBM letting thousands of workers go, replacing them with AI… then hiring even more workers in other areas with the money saved from the AI retooling. Apparently they let a bunch of HR and other admin staff go and beefed up on sales and product development. There are some jobs that you want more predictable algorithms in than potentially biased people, and HR seems like an area that could have a lot of that.

        • Nalivai@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          24 days ago

          Replacing the missing bits with AI is better than not having them at all.

          Nah, bullshit tests that pretend to be tests but are essentially “if true == true then pass” is significantly worse than no test at all.

          • MangoCats@feddit.it
            link
            fedilink
            English
            arrow-up
            1
            ·
            24 days ago

            bullshit tests that pretend to be tests but are essentially “if true == true then pass” is significantly worse than no test at all.

            Sure. But, unsupervised developers who: write the code, write their own tests, change companies every 18 months, are even more likely to pull BS like that than AI is.

            You can actually get some test validity oversight out of AI review of the requirements and tests, not perfect, but better than self-supervised new hires.

            • Nalivai@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              24 days ago

              You can actually get some test validity oversight out of AI review

              You also will get some bullshit out of it. If you’re in a situation when you can’t trust your developers because they’re changing companies every 18 months, and you can’t even supervise your untrustworthy developers, then you sure as shit can’t trust whatever LLM will generate you. At least your flock of developers will bullshit you predictably to save time and energy, with LLM you have zero ideas where lies will come from, and those will be inventive lies.

              • MangoCats@feddit.it
                link
                fedilink
                English
                arrow-up
                1
                ·
                23 days ago

                I work in a “tight” industry where we check ALL our code. By contrast, a lot of places I have visited - including some you would think are fairly important like medical office management and gas pump card reader software makers - are not tight, not tight at all. It’s a matter of moving the needle, improving a bad situation. You’ll never achieve “perfect” on any dynamic non-trivial system, but if you can move closer to it for little or no cost?

                Of course, when I interviewed with that office management software company, they turned me down - probably because they like their culture the way it is and they were afraid I’d change things with my history of working places for at least 2.5 years, sometimes up to 12, and making sure the code is right before it ships instead of giving their sales reps that “hands on, oooh I see why you don’t like that, I’ll have our people fix that right away - just for you” support culture.

      • Baguette@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        3
        ·
        24 days ago

        To preface I don’t actually use ai for anything at my job, which might be a bad metric but my workflow is 10x slower if i even try using ai

        That said, I want AI to be able to do unit tests in the sense that I can write some starting ones, then it be able to infer what branches aren’t covered and help me fill the rest.

        Obviously it’s not smart enough, and honestly I highly doubt it will ever be because that’s the nature of llm, but my peeve with unit test is that testing branches usually entail just copying the exact same test but changing one field to be an invalid value, or a dependency to throw. It’s not hard, just tedious. Branching coverage is already enforced, so you should know when you forgot to test a case.

        Edit: my vision would be an interactive version rather than my company’s current, where it just generates whatever it wants instantly. I’d want something to prompt me saying this branch is not covered, and then tell me how it will try to cover it. It eliminates the tedious work but still lets the dev know what they’re doing.

        I also think you should treat ai code as a pull request and actually review what it writes. My coworkers that do use it don’t really proofread, so it ends up having some bad practices and code smells.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          24 days ago

          testing branches usually entail just copying the exact same test but changing one field to be an invalid value, or a dependency to throw

          That’s what parameterization is for. In unit tests, most dependencies should be mocked, so expecting a dependency to throw shouldn’t really be a thing much of the time.

          I’d want something to prompt me saying this branch is not covered, and then tell me how it will try to cover it

          You can get the first half with coverage tools. The second half should be fairly straightforward, assuming you wrote the code. If a branch is hard to hit (i.e. it happens if an OS or library function fails), either mock that part or don’t bother with the test. I ask my team to hit 70-80% code coverage because that last 20-30% tends to be extreme corner cases that are hard to hit.

          My coworkers that do use it don’t really proofread, so it ends up having some bad practices and code smells.

          And this is the problem. Reviewers only know so much about the overall context and often do a surface level review unless you’re touching something super important.

          We can make conventions all we want, but people will be lazy and submit crap, especially when deadlines are close. >

          • Baguette@lemmy.blahaj.zone
            link
            fedilink
            English
            arrow-up
            2
            ·
            24 days ago

            The issue with my org is the push to be ci/cd means 90% line and branch coverage, which ends up being you spend just as much time writing tests as actually developing the feature, which already is on an accelerated schedule because my org has made promises that end up becoming ridiculous deadlines, like a 2 month project becoming a 1 month deadline

            Mocking is easy, almost everything in my team’s codebase is designed to be mockable. The only stuff I can think of that isn’t mocked are usually just clocks, which you could mock but I actually like using fixed clocks for unit testing most of the time. But mocking is also tedious. Lots of mocks end up being:

            1. Change the test constant expected. Which usually ends up being almost the same input just with one changed field.
            2. Change the response answer from the mock
            3. Given the response, expect the result to be x or some exception y

            Chances are, if you wrote it you should already know what branches are there. It’s just translating that to actual unit tests that’s a pain. Branching logic should be easy to read as well. If I read a nested if statement chances are there’s something that can be redesigned better.

            I also think that 90% of actual testing should be done through integ tests. Unit tests to me helps to validate what you expect to happen, but expectations don’t necessarily equate to real dependencies and inputs. But that’s a preference, mostly because our design philosophy revolves around dependency injection.

            • sugar_in_your_tea@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              2
              ·
              24 days ago

              I also think that 90% of actual testing should be done through integ tests

              I think both are essential, and they test different things. Unit tests verify that individual pieces do what you expect, whereas integration tests verify that those pieces are connected properly. Unit tests should be written by the devs and help them prove their solution works as intended, and integration tests should be written by QA to prove that user flows work as expected.

              Integration test coverage should be measured in terms of features/capabilities, whereas unit tests are measured in terms of branches and lines. My target is 90% for features/capabilities (mostly miss the admin bits that end customers don’t use), and 70-80% for branches and lines (skip unlikely errors, simple data passing code like controllers, etc). Getting the last bit of testing for each is nice, but incredibly difficult and low value.

              Lots of mocks end up being

              I use Python, which allows runtime mocking of existing objects, so most of our mocks are like this:

              @patch.object(Object, "method", return_value=value)
              

              Most tests have one or two lines of this above the test function. It’s pretty simple and not very repetitive at all. If we need more complex mocks, that’s usually a sign we need to refactor the code.

              dependency injection

              I absolutely hate dependency injection, most of the time. 99% of the time, there are only two implementations of a dependency, the standard one and a mock.

              If there’s a way to patch things at runtime (e.g. Python’s unittest.mock lib), dependency injection becomes a massive waste of time with all the boilerplate.

              If there isn’t a way to patch things at runtime, I prefer a more functional approach that works off interfaces where dependencies are merely passed as needed as data. That way you avoid the boilerplate and still get the benefits of DI.

              That said, dependency injection has its place if a dependency has several implementations. I find that’s pretty rare, but maybe its more common in your domain.

        • MangoCats@feddit.it
          link
          fedilink
          English
          arrow-up
          1
          ·
          24 days ago

          A software tester walks into a bar, he orders a beer.

          He orders -1 beers.

          He orders 0 beers.

          He orders 843909245824 beers.

          He orders duck beers.

          AI can be trained to do that, but if you are in a not-well-trodden space, you’ll want to be defining your own edge cases in addition to whatever AI comes up with.

          • ganryuu@lemmy.ca
            link
            fedilink
            English
            arrow-up
            5
            ·
            24 days ago

            Way I heard this joke, it continues with:

            A real customer enters.

            He asks where the toilets are.

            The bar explodes.

      • FishFace@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        24 days ago

        The reason tests are a good candidate is that there is a lot of boilerplate and no complicated business logic. It can be quite a time saver. You probably know some untested code in some project - you could get an llm to write some tests that would at least poke some key code paths, which is better than nothing. If the tests are wrong, it’s barely worse than having no tests.

        • theolodis@feddit.org
          link
          fedilink
          English
          arrow-up
          5
          ·
          24 days ago

          Wrong tests will make you feel safe. And in the worst case, the next developer that is going to port the code will think that somebody wrote those tests with intention, and potentially create broken code to make the test green.

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            4
            ·
            24 days ago

            Exactly! I’ve seen plenty of tests where the test code was confidently wrong and it was obvious the dev just copied the output into the assertion instead of asserting what they expect the output to be. In fact, when I joined my current org, most of the tests were snapshot tests, which automated that process. I’ve pushed to replace them such with better tests, and we caught bugs in the process.

          • FishFace@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            24 days ago

            Then write comments in the tests that say they haven’t been checked.

            That is indeed the absolute worst case though, and most of the tests that are so produced will be giving value because checking a test is easier than checking the code (this is kind of the point of tests) and so most will be correct.

            The risk of regressions covered by the good tests is higher than someone writing code to the rare bad test that you’ve marked as suspicious because you (for whatever reason) are not confident in your ability to check it.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          24 days ago

          better than nothing

          I disagree. I’d much rather have a lower coverage with high quality tests than high coverage with dubious tests.

          If your tests are repetitive, you’re probably writing your tests wrong, or at least focusing on the wrong logic to test. Unit tests should prove the correctness of business logic and calculations. If there’s no significant business logic, there’s little priority for writing a test.

          • FishFace@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            24 days ago

            The actual risk of those tests being wrong is low because you’re checking them.

            If your tests aren’t repetitive they’ve got no setup or mocking in so they don’t test very much.

            • sugar_in_your_tea@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              24 days ago

              If your test code is repetitive, you’re not following DRY sufficiently, or the code under test is overly complicated. We’ll generally have a single mock or setup code for several tests, some of which are parameterized. For example, in Python:

              @parameterized.expand([
                (key, value, Expected Exception,),
                (other_key, other_value, OtherExpectedException,), 
              ])
              def test_exceptions(self, key, value, exception_class):
                  obj = setup()
                  setattr(obj, key, value) 
              
                  with self.assertRaises(exception_class): 
                      func_to_test(obj)
              

              Mocks are similarly simple:

              @unittest.mock.patch.object(Class, "method", return_value=...)
              
              dynamic_mock =  MagicMock(Class)
              dynamic_mock...
              

              How this looks will vary in practice, but the idea is to design code such that usage is simple. If you’re writing complex mocks frequently, there’s probably room for a refactor.

              • FishFace@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                23 days ago

                I know how to use parametrised tests, but thanks.

                Tests are still much more repetitive than application code. If you’re testing a wrapper around some API, each test may need you to mock a different underlying API call. (Mocking all of them at once would hide things). Each mock is different, so you can’t just extract it somewhere; but it is still repetitive.

                If you need three tests each of which require a (real or mock) user, a certain directory structure to be present somewhere, input data to be got from somewhere, that’s three things that, even if you streamline them, need to be done in each test. I have been involved in a project where we originally followed the principle of, “if you need a user object in more than one test, put it in setUp or in a shared fixture” and the result is rapid unwieldy shared setup between tests - and if ever you should want to change one of those tests, you’d better hope you only need to add to it, not to change what’s already there, otherwise you break all the other tests.

                For this reason, zealous application of DRY is not a good idea with tests, and so they are a bit repetitive. That is an acceptable trade-off, but also a place where an LLM can save you some time.

                If you’re writing complex mocks frequently, there’s probably room for a refactor.

                Ah, the end of all coding discussions, “if this is a problem for you, your code sucks.” I mean, you’re not wrong, because all code sucks.

                LLMs are like the junior dev. You have to review their output because they might have screwed up in some stupid way, but that doesn’t mean they’re not worth having.

                • sugar_in_your_tea@sh.itjust.works
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  23 days ago

                  zealous application of DRY is not a good idea with tests

                  I absolutely agree. My point is that if you need complex setup, there’s a good chance you can reuse it and replace only the data that’s relevant for your test instead of constructing it every time.

                  But yes, there’s a limit here. We currently have a veritable mess because we populate the database with fixture data so we have enough data to not need setup logic for each test. Changing that fixture data causes a dozen tests to fail across suites. Since I started at this org, I’ve been pushing against that and introduced the repository pattern so we can easily mock db calls.

                  IMO, reused logic/structures should be limited to one test suite. But even then, rules are meant to be broken, just make sure you justify it.

                  also a place where an LLM can save you some time.

                  I’m still not convinced that’s the case though. A typical mock takes a minute or two to write, most of the time is spent thinking about which cases to hit or refactoring code to make testing easier. Working with the LLM takes at least that long, esp if you count reviewing the generated code and whatnot.

                  LLMs are like the junior dev

                  Right, and I don’t want a junior dev writing my tests. Junior devs are there to be trained with the expectation that they’ll learn from mistakes. LLMs don’t learn, they’re perennially junior.

                  That’s why I don’t use them for code gen and instead use them for research. Writing code is the easy part of my job, knowing what to write is what takes time, so I outsource as much of the latter as I can.

      • Draces@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        24 days ago

        What model are you using? I’ve had such a radically different experience but I’ve only bothered with the latest models. The old ones weren’t even worth trying with

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          24 days ago

          I’ll have to check, we have a few models hosted at our company and I forget the exact versions and whatnot. They’re relatively recent, but not the highest end since we need to host them locally.

          But the issue here isn’t directly related to which model it is, but to the way LLMs work. They cannot reason, they can only give believable output. If the goal is code coverage, it’ll get coverage, but not necessarily be well designed.

          If both the logic and the tests are automated, humans will be lazy and miss stuff. If only the logic is generated, humans can treat the code as a black box and write good tests that way. Humans will be lazy with whatever is automated, so if I have to pick one to be hand written, it’ll be the code that ensures the logic is correct.

          • wesley@yall.theatl.social
            link
            fedilink
            English
            arrow-up
            1
            ·
            24 days ago

            We’re mandated to use it at my work. For unit tests it can really go wild and it’ll write thousands of lines of tests to cover a single file/class for instance whereas a developer would probably only write a fourth as much. You have to be specific to get any decent output from them like “write a test for this function and use inputs x and y and the expected output is z”

            Personally I like writing tests too and I think through what test cases I need based on what the code is supposed to do. Maybe if there are annoying mocks that I need to create I’ll let the AI do that part or something.

            • sugar_in_your_tea@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              24 days ago

              Generating tests like that would take longer than writing the tests myself…

              Nobody is going to thoroughly review thousands of lines of test code.

    • Flamekebab@piefed.social
      link
      fedilink
      English
      arrow-up
      5
      ·
      24 days ago

      I’ve seen it generate working unit tests plenty. In the sense that they pass.

      …they do not actually test the functionality. Of course that function returns what you’re asserting - you overwrote its actual output and checked against that!

    • jjjalljs@ttrpg.network
      link
      fedilink
      English
      arrow-up
      2
      ·
      24 days ago

      One of the guys at my old job submitted a PR with tests that basically just mocked everything, tested nothing. Like,

      with patch("something.whatever", return_value=True):
        assert whatever(0) is True
        assert whatever(1) is True
      

      Except for a few dozen lines, with names that made it look like they were doing useful.

      He used AI to generate them, of course. Pretty useless.

        • SparroHawc@lemmy.zipOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          23 days ago

          At least in those situations, the person writing the tests knows they’re not testing anything…

          • MangoCats@feddit.it
            link
            fedilink
            English
            arrow-up
            1
            ·
            23 days ago

            Some do, some don’t, but more importantly: most just don’t care.

            I had a tester wander into a set of edge cases which weren’t 100% properly handled and their first reaction was “gee, maybe I didn’t see that, it sounds like I’m going to have a lot more work because I did.”