在单元测试中,将预期结果建立在实际结果的基础上是不好的做法吗?

Is it bad practice to base expected results off actual results in unit testing?

提问人:Jay Cork 提问时间:4/10/2019 更新时间:4/18/2019 访问量:773

问:

一位同事正在审查我的一些字符串生成的单元测试代码,这引发了一场冗长的讨论。他们说,预期的结果都应该是硬编码的,并且担心我的很多测试用例都在使用正在测试的内容来测试。

假设有一个简单的函数返回一个带有一些参数的字符串。

generate_string(name, date) #  Function to test
    result 'My Name is {name} I was born on {date} and this isn't my first rodeo'

----Test----

setUp
    name = 'John Doe'
    date = '1990-01-01'

test_that_generate_string_function
    ...
    expected = 'My Name is John Doe I was born on 1990-01-01 and this isn't my first rodeo'
    assertEquals(expected, actual)

我的同事立即认为预期结果应该始终是硬编码的,因为它不再有任何实际结果会影响预期结果的可能性。

test_date_hardcoded_method
    ...
    date = 1990-01-01
    actual = generate_string(name, date)
    expected = 'My Name is John Doe I was born on 1990-01-01 and this isn't my first rodeo'

因此,如果他们想确保日期完全符合要求,他们将传入一个日期值并对预期结果进行硬编码。对我来说,这是有道理的,但似乎也是多余的。该函数已经有一个测试,以确保整个字符串符合预期。任何偏离都将导致测试失败。我的方法是获取实际结果,解构它,对特定的东西进行硬编码,然后将其重新组合起来以用作预期结果。

test_date_deconstucted_method
    ...
    date = get_date()
    actual = generate_string(name, date)
    actual_deconstructed = actual.split(' ')
    actual_deconstructed[-7] = '1990-01-01'  # Hard code small expected change
    expected = join.actual_deconstructed
    assertEquals(expected, actual)

我最终使用每种方法创建了两个测试单元,看看我是否能理解它们来自哪里,但我就是看不出来。当所有预期结果都被硬编码时,任何微小的更改都会使绝大多数测试失败。如果“不是”需要“不是”,那么hardcoed_method就会失败,直到有人手动更改内容。deconstructed_method只关心日期,并且仍然会通过测试。只有当日期发生意外情况时,它才会失败。在其他人做出的更改后,只有少数测试失败了,因此很容易准确地找出问题所在,我认为这是单元测试的全部意义所在。

我还在从事第一份编程工作的第一个月。我的同事比我更有经验。我对自己没有信心,通常只是接受别人的意见作为真理,但这对我来说更有意义。我理解他们的想法,即从实际结果中获取预期结果可能是不好的,但我相信所有其他测试都会形成一个通知测试网络。字符串格式、标记值和格式设置都包括在内,以及检查任何不正确的硬编码测试。

每个测试的预期结果都应该硬编码吗?一旦基础工作已经过测试,使用实际结果来告知预期结果不好吗?

单元测试 与语言无关的 硬编码

评论

0赞 Guy Coder 4/10/2019
对于标题,我的回答是肯定的,这是不好的做法。Is it bad practice to base expected results off actual results in unit testing?
0赞 Guy Coder 4/10/2019
因为听起来你在释义。我并不总是对结果进行硬编码,而是使用同时生成测试和结果的生成器,这些生成器不使用所测试代码中的任何方法或基本方法。这通常很困难,因为很多时候我必须跳过重重障碍才能重新发明轮子,当我想不出办法时,我会对结果进行硬编码。My co-worker was instant that the expected result should always be hard-coded, as it stops there being any chance that the actual result can influence the expected result.
0赞 Guy Coder 4/10/2019
因为编程和证明不是一回事。很少有程序能够证明某些东西。为;相信我,我有一座桥要出售The function already has a test to make sure the entire string is as expected.I trust all the other tests to form a web of informing tests

答:

2赞 rp.beltran 4/10/2019 #1

测试用例的设计应考虑到程序的要求。如果只需要验证字符串的一部分,则仅验证字符串的该部分。如果整个字符串需要验证,请验证整个字符串。通过单元测试应强烈表明已遵守所有可直接测试的需求。

如果一个错误有可能在你没有看到的部分中插入了奇怪的东西,你的测试方法将无法捕捉到这些错误。如果这是一个可以接受的风险,那么你可以选择接受这个机会,但你必须认识到这种可能性并决定你自己的容忍度。

评论

0赞 Guy Coder 4/10/2019
you have to recognize the possibility and decide your own tolerance.在银行或安全地点尝试这种理念,看看你有多久的工作。公司或客户设定要求,如有疑问,请进行查询。
0赞 rp.beltran 4/11/2019
@GuyCoder哈哈,你是对的.也许不能决定你的容忍度,但有人可以。我习惯于在一家初创公司工作,也许不幸的是,我可以做出这些决定。不过,我认为即使对于大型银行来说,较少的验证也是可以接受的,那就是 UI/UX 设计,最坏的情况是渲染不佳。给出的例子看起来有点像是前端工作,这就是为什么我觉得他的需求可能有些放松。you
0赞 Dirk Herrmann 4/18/2019 #2

您有一个从输入数据生成字符串的函数。可以选择让测试用例始终测试整个生成的字符串,尽管每个测试的测试目标是验证该字符串的非常特定的部分。你认为这种方法不好是正确的:由此产生的测试将过于宽泛,因此很脆弱。对于任何更改,它们都会失败/必须维护,而不仅仅是在更改影响生成字符串的特定部分的情况下。看看 Meszaros 对脆弱测试的讨论,特别是“测试对软件应该如何构建或行为说得太多”的部分,你可能会发现很有启发性: http://xunitpatterns.com/Fragile%20Test.html#Overspecified%20Software

实际上,更好的解决方案是使您的测试更加集中,因为您也希望它们更加集中。但是,您选择的方法有点奇怪:您获取生成的字符串,制作一个副本,使用手动编码的预期字符串部分(在相应测试中处于焦点)修补副本,然后再次比较两个完整的字符串,结果和修补的结果。从技术上讲,您已经创建了一个测试,该测试真正只关注预期的部分,因为围绕该部分的字符串的其他部分将始终相等。然而,这种方法令人困惑:对于不完全理解测试代码的人来说,你似乎好像是根据代码本身的结果来测试代码的。

你为什么不反过来做:取结果字符串,剪掉感兴趣的部分,并将这部分与硬编码的期望进行比较?在您的示例中,测试将如下所示:

test_date_part_of_generated_string:
   date = 1990-01-01
   actual_full_string = generate_string(name, date)
   actual_string_parts = actual_full_string.split(' ')
   actual_date_part = actual_string_parts[-7]
   assertEquals('1990-01-01', actual_date_part)
0赞 simbo1905 4/18/2019 #3

在某个时间点,我同意审查代码的人的观点:让测试变得残酷简单。同时,我想测试代码的每个低级部分,以获得完整的测试覆盖率,并进行TDD。

正如你所指出的,问题在于残酷的简单测试是重复的,当你需要为新场景改变东西时,你必须改变大量的测试代码。

然后我和一个比我有二十年经验的人一起编码,我知道他是一个世界级的程序员。他说:“你的测试太重复了,重构它们,使它们不那么脆弱”。我说:“我认为我的测试需要非常简单明了,这意味着我的代码需要重复”。他说,“不要把你的测试代码写成与你的生产代码有任何不同,让它们保持干燥(不要重复自己)”。

然后,这引发了一整类关于我的程序的元问题。什么是足够的测试代码?什么是好的测试代码?

我最终意识到,当我编写大量极其简单和重复的测试时,我花在重构测试上的时间比编写新代码的时间还要多。大量重复的测试代码很脆弱。它并没有阻止错误,而是使添加功能或消除技术债务变得更加困难。在业务逻辑方面,代码越多,价值就越大。同样,当重构它成为“测试债务”时,更冗长的测试代码也无济于事。

This then leads to another big point: loosely typed languages, that need lots of unit tests to prove are correct, need lots of brittle and repetitive tests. Strongly typed languages, where the compiler can statically tell you about logic errors, means you have to write a less test code, that is less brittle, such that you can refactor faster. In a loosely typed language you end up writing lots of test code that makes sure at runtime you don’t pass the wrong types. In a strongly typed function language you only need to validate input at runtime: the compiler validates that your code works. So then you can write a few high level tests and be confident it all works. If you refactor your code you have less tests to refactor. You have tagged your question “language-agnostic” but the answer cannot be. The weaker your compiler the more this question is a problem: the stronger your compiler the less you have to deal with this whole issue.

I attended a four day test driven development course at a big software engineering shop that was done in Smalltalk. Why? Because no-one knows smalltalk, and it is untyped, so we had to write a test for every thing we wrote as we were all beginners in that language. It was fun, but I wouldn’t advise anyone to use a loosely typed language where they had to write a load of tests to know it worked. I would strongly advise people to use a strongly typed language where the compiler does more work, and where there can be less test code, as that is easier to refactor tests when you add new functionality. Likewise functional languages with immutable algebraic types and composition of functions need less tests as they don't have lots of mutable state to worry about. The more modern the programming language the less test code you need to write to keep bugs away.

Obviously, you cannot upgrade the language you are using at your company. So here is the one tip my friend said that sticks with me: test code should be like production code so do not repeat yourself. If you find your tests are becoming repetitive then delete tests. Keep a minimum amount of tests that will break if the logic is broken. Don't keep fifty odd tests that cover all variations of string concatenation. That is ”over-testing” Over-testing inhibits refactoring to add functionality and remove tech debt more than it keeps bugs away. In some languages, this means writing lots of repetitive tests that you need to validate your logic as you write it as scaffolding. Then when you have it working write larger tests that will break if someone breaks a subparts and delete all the repetitive tests so as not leave ”test debt”. This then results in a few coarse-grained tests that are brutally simple without a lot of repetition.