华南理工大学AI为软件产品线量身定制了测试套装

AI为软件产品线量身定制了测试套装

发布时间： 2024-09-28 浏览次数： 10

在这个由代码编织的世界里，每一款软件都像是一个精心设计又充满未知的迷宫。你是否想过，是谁在这些迷宫中为我们指引方向，确保每一步都稳健而正确？揭晓答案：自动化测试套件生成方法。它能够洞察软件的每个角落，挑选出最佳的测试路径，保障软件的健壮性和可靠性。

（图片来自网络）

图1 软件测试如同走迷宫

一、多样性驱动：软件产品线测试

软件产品线（software product line, SPL）Clements P, Northrop L. Software product lines[M]. Boston: Addison-Wesley, 2002.作为一种高效开发模式，能够在共享特征的基础上定制软件以满足不同的用户需求。SPL通过特征模型（feature models, FMs）Batory D. Feature models, grammars, and propositional formulas[C]//International Conference on Software Product Lines. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005: 7-20.来定义一组相关软件产品的共同特征和可变特征。然而，随着产品可变特征增加，单独对每个产品变体进行测试变得不切实际；这推动了自动化测试套件生成技术的发展。自动化测试套件生成旨在生成一组测试用例，以尽可能多地揭示产品中的缺陷。测试用例的生成通常由目标函数（如t-wise覆盖率、套件多样性）作指导；这些目标函数可以量化测试用例的质量。

图2 一个简化的手机产品线的特征模型

以往的研究通常将自动化测试套件生成问题建模为单目标或多目标优化问题。单目标优化Al-Hajjaji M, Krieter S, Thüm T, et al. IncLing: efficient product-line testing using incremental pairwise sampling[J]. ACM SIGPLAN Notices, 2016, 52(3): 144-155.,Krieter S, Thüm T, Schulze S, et al. YASA: yet another sampling algorithm[C]//Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems. 2020: 1-10.,Luo C, Zhao Q, Cai S, et al. SamplingCA: effective and efficient sampling-based pairwise testing for highly configurable software systems[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022: 1185-1197.只关注一个目标，如最大化覆盖率、最小化测试成本等，每次只生成一个测试套件，难以满足不同测试场景的需求。多目标优化Deb K, Pratap A, Agarwal S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE transactions on evolutionary computation, 2002, 6(2): 182-197.Markiegi U, Arrieta A, Sagardui G, et al. Search-based product line fault detection allocating test cases iteratively[C]//Proceedings of the 21st International Systems and Software Product Line Conference-Volume A. 2017: 123-132.同时考虑多个目标，每次生成多个测试套件；它生成的测试套件具有更高的覆盖率和多样性，但算法的设计需要考虑目标冲突和计算复杂度。

为了解决这一问题，智能算法研究中心的最新研究引入了质量-多样性（Quality-Diversity, QD）优化框架Pugh J K, Soros L B, Stanley K O. Quality diversity: A new frontier for evolutionary computation[J]. Frontiers in Robotics and AI, 2016, 3: 202845.，利用MAP-Elites算法Mouret J B, Clune J. Illuminating search spaces by mapping elites[J]. arXiv preprint arXiv:1504.04909, 2015.创新性地对测试套件生成问题进行求解。该方法不仅能够高效地生成多样化的测试套件，还在减少测试成本、增强覆盖率方面超越了单目标和多目标优化方法，甚至在与基于新颖性搜索（Novelty Search, NS）的算法对比时，展示出更好的测试套件多样性与更强的缺陷探测能力。目前，该研究工作Xiang Y, Huang H, Li S, et al. Automated test suite generation for software product lines based on quality-diversity optimization[J]. ACM Transactions on Software Engineering and Methodology, 2024, 33(2): 1-52.已发表于软件工程的顶级期刊ACM Transactions on Software Engineering and Methodology（CCF-A, JCR一区，影响因子6.6）；其代码（https://github.com/gzhuxiangyi/SPLTestingMAP）和数据（https://doi.org/10.5281/zenodo.7805017）均已公开，以供后续研究。

二、优化探索：基于 QD 优化的测试套件生成模型

QD优化能够寻找解空间中每个行为区域内的最优解，这些解不仅性能高，而且在行为空间中具有独特的行为描述。在 SPL 测试场景中，测试套件作为解，即，N代表套包含的测试用例数量。由SamplingCALuo C, Zhao Q, Cai S, et al. SamplingCA: effective and efficient sampling-based pairwise testing for highly configurable software systems[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022: 1185-1197.和PLEDGEHenard C, Papadakis M, Perrouin G, et al. PLEDGE: a product line editor and test generation tool[C]//Proceedings of the 17th International Software Product Line Conference Co-Located Workshops. 2013: 126-129.等工具生成测试用例，并编码为n位的二进制字符串，n代表特性数量，1和0分别标记特征的不同选择状态。

我们假设适应度函数将被最大化。设B为行为空间，QD优化的目标是为每个点b找到具有最大适应度值的解x*。数学公式表达为：

(1)

行为空间是QD优化的核心。在基于QD优化的测试套件生成模型中，被定义为一维空间，以测试套件大小并将其作为唯一的行为描述符。为控制计算负担，测试套件大小设定在区间内，由工程师根据偏好或资源指定。

适应度值量化了解的质量。对于小规模FM的测试，直接使用t-wise覆盖率Lopez-Herrejon R E, Linsbauer L, Egyed A. A systematic mapping study of search-based software engineering for software product lines[J]. Information and software technology, 2015, 61: 33-51.作为适应度函数，定义如下：

其中，是FM中所有的有效t-sets，表示测试用例所覆盖的t-sets，N是测试套件中的测试用例总数，表示集合中元素的数量。

对于大规模FM，直接优化t-wise覆盖率非常耗时，因此使用测试套件多样性作为替代。测试套件多样性Xiang Y, Huang H, Li M, et al. Looking for novelty in search-based software product line testing[J]. IEEE Transactions on Software Engineering, 2021, 48(7): 2317-2338.定义为：

其中，表示测试用例的新颖性得分，计算为其与测试套件中k个最近邻居的平均距离：

表示测试套件TS中第i个测试用例的第j个最近邻居。

总之，基于QD优化的测试套件生成模型综合考虑性能和多样性，为用户提供了一个丰富多样的高性能测试套件集。相比于传统的单目标或多目标优化，该方法更通用灵活，能更好地适应不同的测试需求与偏好。

三、精粹提炼：MAP-Elites算法及其应用

为了求解构建的QD模型，该工作采用了著名的QD算法——MAP-ElitesMouret J B, Clune J. Illuminating search spaces by mapping elites[J]. arXiv preprint arXiv:1504.04909, 2015.。MAP-Elites是一种启发式算法，擅长探索行为空间以寻求各行为域的最佳策略。该算法将特征模型、测试套件大小界限（和）以及终止条件作为输入，输出一系列测试套件供选择，大致流程如下图。

图3 用于自动测试套件生成的MAP-Elites算法流程

该算法包括以下四个主要步骤：

1. 初始化：创建一个与行为空间大小匹配的空存档（archive）；这里的行为空间是测试套件大小的一维表示。起初，每个单元设置为null；随后，使用PLEDGE工具随机生成初始或种子解决方案，并将它们存入archive中。

2. 随机选取：在每次迭代中，从当前存档中随机选取一个解决方案。

3. 突变操作：选定的解决方案被复制后经过突变过程产生新的解决方案；突变方式由测试套件的大小和上下界来决定，具体策略可见图4。

图4 突变方式的决策流程图

4. 更新存档：评估新解决方案的性能，如果比当前存档中相同位置的解更优，则替换更新相应单元。

重复步骤2、3和4，直至满足预设的终止条件。

MAP-Elites的应用要求软件工程师指定测试套件的大小范围，据此产出多样且高效的套件选项，增强测试决策的灵活性与针对性。

四、高效卓越：测试套件生成方法的实验评估

为了验证基于MAP-Elite算法的软件产品线测试套件生成技术的有效性，本研究通过一系列对比实验对该方法进行了深入探究。实验使用了广泛应用于SPL测试技术评估的105个特征模型（FMs），包括真实与人工生成的模型，并采用QD-ScorePugh J K, Soros L B, Szerlip P A, et al. Confronting the challenge of quality diversity[C]//Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. 2015: 967-974.作为评价标准。

与单目标优化方法（multiple independent run of genetic algorithms, MI-GA）相比，无论是使用t-wise覆盖还是测试套件多样性作为适应度函数，MAP-Elites在所有FMs上均显著优于MI-GA。从表1和图5中可以观察到，以2-wise覆盖率为优化目标时，MAP-Elites的QD-Score表现突出，且在追求测试套件多样性的同时，也能保持良好的2-wise覆盖率表现。

表1 适应度函数为2-wise 覆盖率时MAP-Elites和MI-GA的QD-Score

（完整的实验数据表格可查阅原论文）

图5 适应度函数为套件多样性时MAP-Elites和MI-GA之间比较的

Mann-Whitney U检验和统计

进一步分析得到，MAP-Elites之所以表现优异，是因为它通过三个突变操作符（测试用例移除、添加和替换）实现了QD子问题之间的信息共享。而MI-GA仅使用替换操作，缺乏信息共享机制。如图6所示，在典型运行过程中，相较于替换操作，MAP-Elites中测试用例的移除和添加操作更能促成更新的成功，即找到具有更高覆盖率的更优测试套件。

图6 对9个代表性FMs进行测试用例移除(R)、添加(A)和替换(S)的成功更新数量

此外，MAP-Elites在t-wise覆盖率上普遍等同于或优于多目标进化算法NSGA-IIDeb K, Pratap A, Agarwal S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE transactions on evolutionary computation, 2002, 6(2): 182-197.。与现有的t-wise测试工具Al-Hajjaji M, Krieter S, Thüm T, et al. IncLing: efficient product-line testing using incremental pairwise sampling[J]. ACM SIGPLAN Notices, 2016, 52(3): 144-155.,Krieter S, Thüm T, Schulze S, et al. YASA: yet another sampling algorithm[C]//Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems. 2020: 1-10.,Luo C, Zhao Q, Cai S, et al. SamplingCA: effective and efficient sampling-based pairwise testing for highly configurable software systems[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022: 1185-1197.相比，MAP-Elites在生成多样化和高性能测试套件方面更胜一筹，尤其是在减少现有测试工具生成的覆盖数组大小方面表现更为出色。更多实验细节与数据结果，欢迎阅读完整论文。

综上所述，基于QD优化的MAP-Elites算法在SPL自动化测试套件生成中表现出色；无论是应用于小规模还是大规模功能模型，它都能提升测试套件的质量和多样性，同时保持良好的计算效率。未来，智能算法研究中心将进一步探索基于QD优化的测试套件生成方法，包括扩展行为空间、研究多目标QD优化、采用更先进的QD算法和SAT求解器。此外，研究中心还计划在真实软件产品线上进行测试，以期推动测试套件生成技术的进步。

参考文献

Clements P, Northrop L. Software product lines[M]. Boston: Addison-Wesley, 2002.
Batory D. Feature models, grammars, and propositional formulas[C]//International Conference on Software Product Lines. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005: 7-20.
Al-Hajjaji M, Krieter S, Thüm T, et al. IncLing: efficient product-line testing using incremental pairwise sampling[J]. ACM SIGPLAN Notices, 2016, 52(3): 144-155.
Krieter S, Thüm T, Schulze S, et al. YASA: yet another sampling algorithm[C]//Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems. 2020: 1-10.
Luo C, Zhao Q, Cai S, et al. SamplingCA: effective and efficient sampling-based pairwise testing for highly configurable software systems[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022: 1185-1197.
Deb K, Pratap A, Agarwal S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE transactions on evolutionary computation, 2002, 6(2): 182-197.
Markiegi U, Arrieta A, Sagardui G, et al. Search-based product line fault detection allocating test cases iteratively[C]//Proceedings of the 21st International Systems and Software Product Line Conference-Volume A. 2017: 123-132.
Pugh J K, Soros L B, Stanley K O. Quality diversity: A new frontier for evolutionary computation[J]. Frontiers in Robotics and AI, 2016, 3: 202845.
Mouret J B, Clune J. Illuminating search spaces by mapping elites[J]. arXiv preprint arXiv:1504.04909, 2015.
Xiang Y, Huang H, Li S, et al. Automated test suite generation for software product lines based on quality-diversity optimization[J]. ACM Transactions on Software Engineering and Methodology, 2024, 33(2): 1-52.
Henard C, Papadakis M, Perrouin G, et al. PLEDGE: a product line editor and test generation tool[C]//Proceedings of the 17th International Software Product Line Conference Co-Located Workshops. 2013: 126-129.
Lopez-Herrejon R E, Linsbauer L, Egyed A. A systematic mapping study of search-based software engineering for software product lines[J]. Information and software technology, 2015, 61: 33-51.
Xiang Y, Huang H, Li M, et al. Looking for novelty in search-based software product line testing[J]. IEEE Transactions on Software Engineering, 2021, 48(7): 2317-2338.
Pugh J K, Soros L B, Szerlip P A, et al. Confronting the challenge of quality diversity[C]//Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. 2015: 967-974.

总编：黄翰

责任编辑：雷墨鹥兮

文字：向毅、梁靖欣

图片：向毅、梁靖欣

校稿：陈嘉慧

时间：2024年8月9日