Skip to content

Marking tests as "long" is outdated #10840

@ligurio

Description

@ligurio

test-run.py, that we use for running Tarantool tests, allows marking functional tests as "long" 1:

long_run - mark tests as long, enabled only with --long option (delimited with the space, e.g. long_run=t1.test.lua t2.test.lua)

The tests marked as "long" executes in a separate job in GH Actions and skipped by default by test-run.py.

I took a look at these tests when doing PR #10216, and I see two problems with this:

  • There is no precise criteria to distinguish long tests from other tests. Often it's a matter of taste: it feels like some tests take longer than others.
  • A test added to "long_run" will most likely not be removed, even if it is no longer such.

Let's run test-run.py with --long, only 5 tests in the longest tests are the tests marked as "long":

Top 10 longest tests (seconds):                                          
* 102.90 engine_long/delete_replace_update.test.lua:memtx       (long)
*  99.75 vinyl-luatest/select_consistency_test.lua                             (long)
*  93.07 engine_long/delete_replace_update.test.lua:vinyl             (long)
*  81.26 config-luatest/failover_and_election_mode_test.lua           
*  79.75 box-luatest/gh_7605_qsort_recovery_test.lua                    (long)
*  79.18 config-luatest/compat_test.lua                               
*  73.64 box/alter-primary-index-tuple-leak-long.test.lua               (long)
*  71.89 config-luatest/log_wrapper_test.lua                          
*  65.56 replication-luatest/quorum_orphan_test.lua                   
*  61.71 config-luatest/basic_test.lua                                

In Tarantool source tree with latest commit (f65de7e) there are 19 tests marked as "long".
I've executed tests by CTest to obtain test execution times (see JUnit report tarantool.xml.zip with timings):

  1. vinyl/stress.test.lua - 33.274 s
  2. vinyl/large.test.lua - 18.654 s
  3. vinyl/write_iterator_rand.test.lua - 20.1814 s
  4. vinyl/dump_stress.test.lua - 18.8457 s
  5. vinyl/select_consistency.test.lua - 9.12993 s
  6. vinyl/throttle.test.lua - 8.32499 s
  7. replication/prune.test.lua - 43.6834 s
  8. vinyl-luatest/select_consistency_test.lua - 93.4773 s
  9. box/huge_field_map_long.test.lua - 8.38625 s
  10. box/alter-primary-index-tuple-leak-long.test.lua - 90.0156 s
  11. box-luatest/gh_7605_qsort_recovery_test.lua - 101.907 s
  12. box-luatest/gh_7670_memtx_tx_manager_idx_rand_inconsistency_test.lua - 31.2257 s
  13. xlog/snap_io_rate.test.lua - 14.1248 s
  14. sql-luatest/ghs_119_too_long_mem_values_test.lua - 62.4867 s
  15. sql-luatest/ghs_122_allocations_in_printf_test.lua - 11.7021 s
  16. sql-tap/gh-3332-tuple-format-leak.test.lua - 22.3404 s
  17. sql-tap/gh-3083-ephemeral-unref-tuples.test.lua - 37.5453 s
  18. engine_long/delete_replace_update.test.lua - 174.352 s
  19. engine_long/delete_insert.test.lua - 43.6055 s

Seems most of these tests are not "long" as supposed.

I propose to get rid of marking tests as "long" and run these tests as others.
Or define exact criteria for different test sizes (learn more about "small", "medium" and "large" tests in 2, 3, 4 and 5):

Image

and support these test sizes in our test runners (at least in luatest and test-run.py).

Footnotes

  1. https://github.com/tarantool/test-run/?tab=readme-ov-file#test-suite

  2. https://testing.googleblog.com/2010/12/test-sizes.html

  3. https://testing.googleblog.com/2011/03/how-google-tests-software-part-five.html

  4. https://mike-bland.com/2011/11/01/small-medium-large.html

  5. https://abseil.io/resources/swe-book/html/ch14.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    code healthImprove code readability, simplify maintenance and so on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions