It would be an exaggeration to say that Artificial Intelligence (AI) is responsible for a revolution in software development. AI has greatly impacted software testing, in particular. In this article, we’d like to take a closer look at how AI is impacting testing and test automation.
AI and large language models are all the rage now – with good reason. Large language models (LLMs) are extraordinary algorithms that are an especially strong subfield of AI. Still, they aren’t the be-all and end-all of what artificial intelligence can do or even of what machine learning can do. They are neither general AI nor AI as we know it, but they are well-suited to achieving new types of software development and testing automation. Here’s how.
How Is AI Transforming Software Testing and Test Automation?
In recent decades, software testing has been through numerous transformations. It started with manual testing, migrated to automated testing, and has now transitioned into AI-centric testing strategies. AI is already being used in QA in some areas. Let’s explore in detail some of the key AI application roles within the realm of QA.
Essential AI tools and their applications in software testing:
-
Code and comments have been produced and autocompleted with the help of GitHub Copilot. Several test automation solutions at our company have already taken advantage of this service. It is one of the best AI-promoted services on our list.
-
ChatGPT – a popular AI system. Test engineer, ChatGPT would help to generate test cases as per requirements, traceability matrixes, test plans and strategy, test data, SQL queries, API client, XPATH locator for elements, regular expression and more. Returning to the test automation angle, ChatGPT can create utilities/helpers, test data, code generation, and more. There are also paid versions with more features and abilities, where you can train and use your custom user GPT models. ChatGPT Plus makes ready your own created GPTs trained on your data, user-created documents (guidelines, best practices, policies), images or other files you have, helps plan a test strategy and development of a complete test plan including test cases, helps in mockups review during development. These are only examples of the most popular uses of ChatGPT.
Overview of AI in End-To-End Test Automation
Software developers insert AI into test automation procedures to allow it to generate code and execute tests from a test case or requirements. Looking back, the answer is familiar: comparable frameworks had been in the market before 2023. Some claimed to automatically heal locators (Selenium Reinforcement Learning, Healenium, Testim, and so on), and some even claimed to generate code for automated tests based on test cases.
The Main Disadvantages of Test Automation Solutions Based on AI
The general idea is that we should be able to auto-fix a test before it fails and, in an ideal world, never even write the test but instead have an AI framework do it for us. It sounds like a great idea, but if it were true, we wouldn’t have to use Selenium and other test frameworks that still require coding and engineers.
-
Slowness: Send data to and let an AI backend process it and only then send it back to the client.
-
Stability: There’s no way to test the generated code 100 per cent; after the UI changes, the generated code still needs to be reviewed (it will be wrong, just like any generated code, and needs to be fixed). We don’t want to rely on such testing (and therefore the trust) to generate code.
-
Dependency on other wrappers: Our experiments showed that popular AI libraries use different libraries to access AI services such as OpenAI API, default or similar, so extra dependencies are introduced.
-
Cost: AI-powered tools typically require a monthly or yearly subscription payment.
These disadvantages keep AI-powered UI test automation tools less popular than open-source original tools such as Selenium, Cypress, Playwright, and WebDriver.IO.
Auto Playwright
This wrapper sits on Playwright and invokes the OpenAI API under the hood to generate code and locators based on the page's DOM model. To be able to create and run tests, you need to use your OpenAI API token. We found some limitations during the testing: Auto Playwright is naturally slower to execute than Playwright since the entire page is sent to OpenAI. The cost of usage depends on how many of the resulting tokens are being sent; we processed approximately 10 HTML pages, and it was about $0.1–$0.5 per 10 API calls (or steps) – but the price had been cut down over several months, and now it is cheaper. Therefore, to cut costs, Auto Playwright uses the HTML sanitiser library that lowers HTML document size.
Quality-wise, they were runnable only for trivial use cases on English-language web apps. For more complex ones, tests failed due to missing locators or timeout.
ZeroStep
A few experiments with ZeroStep showed that it is faster than Auto Playwright. ZeroStep works through its backend, but under the hood, there is OpenAI. To work with it, you need to register on their website, and after registering, they will give you a token for your tests. The free version has a limit of 500 AI function calls per month (it could be a different number when you read this), and if you pay $20 per month, you can call no more than 2000 (again – it could be different by the time you read this). This framework has the same speed as Auto Playwright but slower than the initial Playwright framework. While working with this framework, we had issues with websites in other languages and noticed that not all the English controls were recognised in English web applications.
Comparison of Auto Playwright vs. ZeroStep Frameworks
Comparison criteria |
Auto Playwright (PoC) |
ZeroStep (commercial) |
Uses OpenAI API |
Yes |
Yes |
Uses HTML sanitisation |
Yes |
No |
Uses Playwright API |
Yes |
No It uses some Playwright API but mainly relies on Chrome DevTools Protocol (CDP) |
Snapshots/caching |
No |
Yes |
Implements parallelism |
No |
Yes |
License |
MIT |
MIT |
Allows scrolling |
No |
Yes |
Auto Playwright is more of a proof of concept than a ready-to-use commercial tool. ZeroStep, on the other hand, is a commercial tool. Still, its slowness, lack of proper control identification, and other limitations prevent it from being used as a drop-in replacement for the original Playwright.
Generating test scripts from the requirements is an excellent idea, and it looks fantastic in demos. But it is still far from commercial use due to the cost of code generation and execution, the slow speed of code execution, incorrect processing of more sophisticated web applications, and the absence of support for dynamic elements that appear in the DOM model only after the user performs some actions on the UI. We believe that this technology will improve significantly with time. It will be used for at least simple smoke tests.
How Ai-Powered Visual Analysis Enhances Application Testing
Non-visual tools and services also rely on AI to perform AS. Testing-wise, these solutions take images as input, process them, and produce the output. Such services pertain to various testing tasks, such as test generation, quality feedback/report, image comparison, OCR operations, complex PDF documents, etc.
-
Public appearances of AI models such as GPT 4 Turbo with Vision have opened up a massive space for thousands of startups and products.
-
App Quality Copilot is one service that provides the best feedback based on the screenshots. It is a new service in the market – the first public version of it appeared at the beginning of 2024. It provides three main functions:
-
Generation and execution of mobile auto tests (UI end-2-end) from requirements.
-
Bugs report on pictures from mobile applications, desktops, and mobile browsers, with insights gathered from different perspectives and areas: functional, translation issues, UI/UX, missing data, broken images, etc.
-
Generation of test cases from requirements.
Challenges in Incorporating AI Tools for Test Automation and Testing
But the world of AI changes every day. First, there are thousands of tools in the AI market, not hundreds. Most are just wrappers over the top five AI engines (OpenAI, Gemini, Anthropic Claude API, etc.).
The main challenges we face when incorporating AI tools for automated testing are:
-
The market landscape for AI products was volatile and fast-moving. Leaders constantly competed with each other to release new features, lower prices, and improve their models' performance.
-
There are thousands of tools and wrappers, but only some are interesting and worth trying.
-
Security questions and sensitive data should be the priority. AI vendors deal with risk and have some security. Security policies should be considered carefully, and a few more extra agreements are necessary before using any AI tools with sensitive data with the customer.
Key Takeaways
AI technology can improve performance in many ways, so we should use it. The more AI technology we develop, the more potential for using it for better software testing. It's essential to prevent technology misuse and not leak data due to technology.