IMPROVING THE RELIABILITY AND VALIDITY OF TEST DATA ADEQUACY IN PROGRAMMING ASSESSMENTS

Rohaida Romli, Shahida Sulaiman, Kamal Zuhairi Zamli

Abstract


Automatic Programming Assessment (or APA) has recently become a notable method in assisting educators of programming courses to automatically assess and grade students’ programming exercises as its counterpart; the typical manual tasks are prone to errors and lead to inconsistency. Practically, this method also provides an alternative means of reducing the educators’ workload effectively. By default, test data generation process plays an important role to perform a dynamic testing on students’ programs. Dynamic testing involves the execution of a program against different inputs or test data and the comparison of the results with the expected output, which must conform to the program specifications. In the software testing field, there have been diverse automated methods for test data generation. Unfortunately, APA rarely adopts these methods. Limited studies have attempted to integrate APA and test data generation to include more useful features and to provide a precise and thorough quality program testing. Thus, we propose a framework of test data generation known as FaSt-Gen covering both the functional and structural testing of a program for APA. Functional testing is a testing that relies on specified functional requirements and focuses the output generated in response to the selected test data and execution, Meanwhile, structural testing looks at the specific program logic to verify how it works. Overall, FaSt-Gen contributes as a means to educators of programming courses to furnish an adequate set of test data to assess students’ programming solutions regardless of having the optimal expertise in the particular knowledge of test cases design. FaSt-Gen integrates the positive and negative testing criteria or so-called reliable and valid test adequacy criteria to derive desired test data and test set schema. As for the functional testing, the integration of specification-derived test and simplified boundary value analysis techniques covering both the criteria. Path coverage criterion guides the test data selection for structural testing. The findings from the conducted controlled experiment and comparative study evaluation show that FaSt-Gen improves the reliability and validity of test data adequacy in programming assessments.


Keywords


Automatic Programming Assessment (APA), test data generation, functional testing, structural testing, test data adequacy, positive testing, negative testing

Full Text:

PDF

References


Truong, N., P. Bancroft, and P. Roe. 2005. Learning to Program Through the Web. ACM SIGCSE Bulletin. 37(3): 9-13.

Shaffer, S. C. 2005. Ludwig: An Online Programming Tutoring and Assessment System. ACM SIGCSE Bulletin. 37(2): 56-60.

Jackson, D. 1996. A Software System for Grading Student Computer Programs. Computers and Education. 27(3): 171-180.

Tremblay, G. and E. Labonte. 2003. Semi-Automatic Marking of Java Programs using Junit. In Proceeding of International Conference on Education and Information Systems: Technologies and Applications (EISTA ’03). 42-47.

Saikkonen, R., L. Malmi, and A. Korhonen. 2001. Fully Automatic Assessment of Programming Exercises. ACM SIGCSE Bulletin. 33 (3): 133-136

Aleman, J. L. F. 2011. Automated Assessment in Programming Tools Course. IEEE Transactions on Education. 54(4): 576-581.

Ihantola, P., T. Ahoniemi, and V. Karavirt. 2010. Review of Recent Systems for Automatic Assessment of Programming Assignments. In Proceedings of the 10th Koli Calling International Conference on Computing Education Research (Koli Calling ’10). 86-93.

Jackson, D. and M. Usher. 1997. Grading Student Programs using ASSYST. In Proceedings of the 28th SIGCSE Technical Symposium on Computer Science Education. 35-339.

Luck, M. and M. S. Joy. 1999. Secure On-line Submission System. Journal of Software–Practise and Experience. 29(8): 721-740.

Blumenstein, M., S. Green, A. Nguyen and V. Muthukkumarasamy. 2004. GAME: A Generic Automated Marking Environment for Programming Assessment. In Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’04). 2: 212-216.

Malmi, L., V. Karavirta, A. Korhonen, J. Nikander, O. Seppala and P. Silvasti. 2004. Visual Algorithm Simulation Exercise System with Automatic Assessment: TRAKLA2. Informatics in Education. 3(2): 267-288.

Choy, M., U. Nazir, C. K. Poon and Y. T. Yu. 2005. Experiences in Using an Automated System for Improving Students’ of Computer Programming. Lecture Notes in Computer Science Learning (Springer Berlin/ Heidelberg). 267-272.

Higgins, C. A., G. Gray, P. Symeonidis, and A. Tsintsifas. 2006. Automated Assessment and Experiences of Teaching Programming. Journal of Educational Resources in Computing. 5(3): Article 5.

Gotel, O., C. Scharff and A. Wildenberg. 2007. Extending and Contributing to an Open Source Web-Based System for the Assessment of Programming Problems. In Proceedings of the 5th International Symposium on Principles and Practice of Programming in Java (PPPJ’07). Lisboa, Portugal. 3-12.

Auffarth, B., M. Lopez-Sanchez, J. C. Miralles, and A. Puig. 2008. System for Automated Assistance in Correction of Programming Exercises (SAC). In Proceedings of the fifth CIDUI–V International Congress of University Teaching and Innovation.

Tremblay, G., F. Gu´erin, A. Pons, and A. Salah. 2008. Oto, A Generic and Extensible Tool for Marking Programming Assignments. Journal of Software-Practice and Experience. 38(3): 307-333

Nunome, A., H. Hirata, M. Fukuzawa and K. Shibayama. 2010. Development of an E-learning Back-end System for Code Assessment in Elementary Programming Practice. In Proceeding of the 38th Annual Fall Conference on SIGUCCS. Norfolk, VA, USA. 181-186.

Queiros, R. and J. S. Leal. 2012. PETCHA-A Programming Exercises Teaching Assistant. In Proceeding of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE’12). Haifa, Israel. 192-197.

Shamsi, F. A. and A. Elnagar. 2012. An Intelligent Assessment Tool for Students’ Java Submissions in Introductory Programming Courses. Journal of Intelligent Learning Systems and Applications. 4(1): 59-69.

Sherman, M., S. Bassil, D. Lipman, N. Tuck and F. Martin. 2013. Impact of Auto-Grading on an Introductory Computing Course. Journal of Computing Sciences in Colleges. 28(6): 69-75.

Malmi, L., R. Saikkonen and A. Korhonen. 2002. Experiences in Automatic Assessment on Mass Courses and Issues for Designing Virtual Courses. In Proceedings of the 7th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE’ 02). Aarhus Denmark. 55-59.

Chu, H. D., J. E. Dobson and I. C. Liu.1997. FAST: A Framework for Automating Statistical-based Testing. Software Quality Journal. 6(1): 13-36.

Burnstein, I. 2003. Practical Software Testing. New York: Springer-Verlag.

Deimel-Jr., L. E. and B. A. Clarkson.1978. The TODISK-WATLOAD System: A Convenient Tool for Evaluating Student Programs. Proceeding of 16th ACM Annual Southeast Regional Conference. Atlanta). 168-171.

Van-Vliet, H. 2008. Sofware Engineering: Principles and Practice. 3rd Edition. Great Britain: John Wiley & Sons, Ltd, Glasgow.

Sommerville, I. 1995. Software Engineering. 5nd Edition. USA: Pearson-Addison Wesley.

Goodenough, J. B. and S. L. Gerhart. 1975. Towards a Theory of Test Data Selection. In Proceedings of the International Conference on Reliable Software. New York, USA. 493-510.

Jackson, D. 2000. A Semi-Automated Approach to Online Assessment. Proceedings of the 5th annual SIGCSE/SIGCUE Conference on Innovation and Technology in Computer Science Education (ITiCSE ’00). Helsinki, Finland. 164-167.

Clarke, L. A. 1976. A System To Generate Test Data and Symbolically Execute Programs. IEEE Transaction on Software Engineering. SE-2(3): 215-222.

Gupta, N., A. P. Mathur and M. L. Soffa. 1998. Automated Test Data Generation Using an Iterative Relaxation Method. ACM SIGSOFT Software Engineering Notes. 23(6): 231-245.

Pargas, R. P., M. J. Harrold and R. R. Peck. 1999. Test-Data Generation Using Genetic Algorithms. Journal of Software Testing, Verification and Reliability. 9(4): 263-282.

Offutt, J., S. Liu, A. Abdurazik and P. Ammann. 2003. Generating Test Data from State-Based Specifications. Software Testing, Verification and Reliability. 13: 25-53.

Zamli, K. Z., N. A. M. Isa, M. F. J. Klaib and S. N. Azizan. 2007. Tool for Automated Test Data Generation (and Execution) Based on Combinatorial Approach. International Journal of Software Engineering and Its Applications. 1(1): 19-36.

Alshraideh, M., L. Bottaci and B. A. Mahafzah. 2010. Using Program Data-state Scarcity to Guide Automatic Test Data Generation. Software Quality Journal. 18(1): 109-144.

Zhang, Y., D, Gong and Y. Luo. 2011. Evolutionary Generation of Test Data for Path Coverage with Faults Detection. In Proceeding of the 2011 Seventh International Conference on Natural Computation (ICNC). 4: 2086-2090.

McMinn, P., M. Harman, K. Lakhotia, Y. Hassoun and J. Wegener. 2012. Input Domain Reduction through Irrelevant Variable Removal and Its Effect on Local, Global, and Hybrid Search-Based Structural Test Data Generation. IEEE Transactions on Software Engineering. 38(2): 453-477.

Benouhiba, T. and W. Zidoune. 2012. Targeted Adequacy Criteria for Search-based Test Data Generation. In Proceeding of 2012 International Conference on Information Technology and e-Services (ICITeS’12). Sousse, Tunisia. 1-6.

Bhasin, H., N. Singla, and S. Sharma. 2013. Cellular Automata Based Test Data Generation. ACM SIGSOFT Software Engineering Notes. 38(4): 1-7.

Monpratarnchai, S., S. Fujiwara, A. Katayama, and T. Uehara. 2014. Automated Testing for Java Programs using JPF-based Test Case Generation. ACM SIGSOFT Software Engineering Notes. 39 (1): 1-5.

Guo, M., T. Chai and K. Qian. 2010. Design of Online Runtime and Testing Environment for Instant Java Programming Assessment. In Proceeding of 7th International Conference on Information Technology: New Generation (ITNG 2010). Las Vegas, NV. 1102-1106.

Cheng, Z., R. Monahan and A. Mooney. 2011. nExaminer: A Semi-automated Computer Programming Assignment Assessment Framework for Moodle. In Proceedings of International Conference on Engaging Pedagogy 2011 (ICEP11). NCI, Dublin, Ireland. 1-12.

Jones, E. L. 2001. Grading Student Programs- A Software Testing Approach. Journal of Computing Sciences in Colleges. 16(2): 185-192.

Isong, J. 2001. Developing An Automated Program Checker. Journal of Computing Sciences in Colleges. 16(3): 218-224.

Edwards, S. H. 2003. Improving Student Performance by Evaluating How Well Student Test Their Own Programs. Journal on Educational Resources in Computing (JERIC). 3(3): 1-24.

Fischer, G. and J. W. Gudenberg. 2006. Improving the Quality of Programming Education by Online Assessment. Proceedings of the 4th International Symposium on Principles and Practice of programming in Java. Mannheim, Germany. 208-211.

Rossling, G. and S. Hartte. 2008. WebTask: Online Programming Exercises Made Easy. Proceedings of ITiCSE’08. Madrid, Spain. 363.

Jurado, F., M. Redondo and M. Ortega. 2012. Using Fuzzy Logic Applied to Software Metrics and Test Cases to Assess Programming Assignments and Give Advice. Journal of Network and Computer Applications. 35(2): 695-712.

Shukur, Z., R. Romli and A. B. Hamdan. 2005. Skema Penjanaan Data dan Pemberat Ujian Berasaskan Kaedah Analisis Nilai Sempadan (A Schema of Generating Test Data and Test Weight Based on Boundary Value Analysis Technique), Technology Journal. 42(D): 23-40.

Ihantola, P. 2006. Automatic Test Data Generation for Programming Exercises with Symbolic Execution and Java PathFinder. Master Thesis of Helsinki University of Technology, Finland.

Tillmann, N., J. D. Halleux, T. Xie, S. Gulwani and J. Bishop. 2013. Teaching and Learning Programming and Software Engineering via Interactive Gaming. In Proceedings of the 2013 International Conference on Software Engineering (ICSE’13). San Francisco, CA, USA. 1117-1126.

Tillmann, N. and J. D. Halleux. 2008. Pex-white Box Test Generation for .NET, Tests and Proofs. Lecture Notes in Computer Science. 4966: 134-153.

Hakulinen, L. and L. Malmi. 2014. QR Code Programming Tasks with Automated Assessment. Proceedings of the 2014 conference on Innovation & technology in computer science education (ITiCSE’14). Uppsala, Sweden. 177-182.

Romli, R., S. Sulaiman and K. Z. Zamli. 2010. Automatic Programming Assessment and Test Data Generation: A Review on Its Approaches. In Proceeding of 2010 International Symposium on Information Technology (ITSim’10). Kuala Lumpur, M’sia. 1186-1192.

Romli, R., S. Sulaiman and K. Z. Zamli. 2011. Current Practices of Programming Assessment at Higher Learning Institutions. CCIS 179 (Springer Berlin/Heidelberg). Part 1: 471-485.

Bache, R. and G. Bazzana. 1994. Software Metrics for Product Assessment. International Software Quality Assurance Series, Europe: McGraw-Hill.

Tracey, N. G. 2000. A Search-Based Automated Test-Data Generation Framework for Safety Critical Software. PhD Thesis, University of York, UK.

IPL Information Processing Ltd., Structural Coverage Metrics. 1997. [Online] from: http://www.ipl.com/pdf/p0823.pdf. [Accessed on: 10 Feb 2009].

Pezze, M. and M. Young. 2008. Software Testing and Analysis: Process, Principles, and Techniques. USA: John Wiley & Sons, Inc.

Gillies, A. 1992. Software Quality: Theory and Management. Boston: Kluwer Academic Publisher.

Romli, R., S. Sulaiman, and K. Z. Zamli. 2011. Test Data Generation in Automatic Programming Assessment: The Design of Test Set Schema for Functional Testing. Proceeding of 2nd International Conference on Advancements in Computing Technology (ICACT’11). Jeju Island, South Korea.1078-1082.

Romli, R., S. Sulaiman, and K. Z. Zamli.2013. Designing a Test Set for Structural Testing in Automatic Programming Assessment. International Journal of Advances in Soft Computing and Its Application (Special Issues on Application of Soft Computing in Software Engineering). 5(3): 41-64.

Fraenkel, J. R. and N. E. Wallen. 2000. How to Design and Evaluate Research in Education, 4th Edition, USA: McGraw-Hill Companies.

Howden, W. E. 1978. An Evaluation of the Effectiveness of Symbolic Testing. Software-Practice and Experience. 8: 381-397.

McMinn, P. 2004. Search-based Software Test Data Generation: A Survey. Software Testing, Verification & Reliability. 14(2): 105-156.




DOI: https://doi.org/10.11113/jt.v77.6201

Refbacks

  • There are currently no refbacks.


  

Copyright © 2012 Penerbit UTM Press, Universiti Teknologi Malaysia.
Disclaimer : This website has been updated to the best of our knowledge to be accurate. However, Universiti Teknologi Malaysia shall not be liable for any loss or damage caused by the usage of any information obtained from this web site.
Best viewed: Mozilla Firefox 4.0 & Google Chrome at 1024 × 768 resolution.