Joseph Ours' Blog

Wednesday, November 6, 2013

A Tester's Perspective: What Went Wrong with Healthcare.gov?

Background

By now everybody has heard of the healthcare.gov website rollout. By all accounts this initial rollout has been a colossal failure. In this multi-part blog post we will examine some of the key issues that have caused this rollout to go awry.

In a software project of this size it’s no surprise that there are issues with the initial rollout. However the magnitude of these issues has been quite surprising to IT professionals and the layperson alike. To give some context, let’s look at some key findings of large-scale IT projects. In 2012, Standish Group International conducted a survey of large-scale IT projects. They found that projects valued over $10 million that less than 10% of those projects were completed successfully - meaning on time and within budget. To bring this a little closer to home, the survey found that 48% of all federal IT projects had to be re-baselined. In layman’s terms this means that they had to restructure the projects because of cost overruns and/or a change in project goals. Of the projects that had to be re-baselined, it interesting to note that more than half had to be re-baselined more than one time. This does not bode well for a federally run IT project. The department of Health and Human Services history is even more troublesome. As of 2008, 43% of their projects were on the Office of Management and Budget’s “watchlist” because of poor performance and other management issues.

Before we can evaluate what went wrong with the website rollout, we have to evaluate what its purpose was. By most accounts the purpose of the website is to provide a centralized location through which folks seeking health insurance can secure health insurance. By the Department of Health and Human Services estimates they are looking to serve 47 million uninsured customers however they set a much lower goal of only actually enrolling 7 million by March 1, 2014.

First Reported Symptoms

Before we dive into specific problems, I do want to point out some notable headlines that made the news. Most people seem to assume that the problems with the website deal solely with scalability and performance; however, actual site traffic tells a different story. In the first week of the website went live, they had 9.47 million users. Of those who visited the site, many of them experience page wait times of greater than 8 seconds, and less than 3% of them could actually create an account. In a 2009 Aberdeen study, they found that 40% of users will abandon a website after waiting as little as 2 seconds. Within 8 second wait time, an inability to create an account; it is of no surprise that the website is failing to meet expectations.

So the big question is, what went wrong with the site, and what could have been done to prevent it?

Before we can detail specific defects within the website is important to point out the general process flow that users are expected to follow when using the website. That flow is depicted below:

As you can see, users are expected to go to the healthcare.gov landing page. This is the page that really kicks off what the user can and should do next. Primarily the user should create an account through the registration process. This includes setting a user ID and password and providing other pieces of personal information. It also includes an email verification step. From there they go through a process to verify their identity which involves calls to credit bureaus. From this point, eligibility for government subsidies is determined, which is largely a background operation, and then the user can request healthcare quotes. Once they receive quotes, they can then choose coverage and complete the enrollment and payment processes. Once confirmed with the insurance company, they then have coverage. Obviously, this is a simplified version of the overall process but one that most people can understand and identify with. We will detail some of these defects of various steps in this process in this and future blog post.

The Landing Page

Performance

Most public of these issues with the healthcare government website have been performance related. As we have already pointed out, users were experiencing and 8+ second wait time. The government has publicly stated that the hardware was sized to handle between 50k and 60k concurrent users. The government has also stated that the code was designed on the Medicaid Part D rollout which was designed to handle 30k concurrent users. Meanwhile, the government says they actually experienced 250k concurrent users. These conflicting statements begin to imply that performance was a concern but still treated as an afterthought in the software development process. As a side note I should point out that when websites fail they tend to be capable of handling more concurrent users due to the simple fact that they are not processing what they should - and in this case returning a static error page is far easier on the servers than actually processing requests. In addition it is doubtful that there was a standard definition of a concurrent user for this particular project. So it is of no surprise that the government is reporting five times as many users as expected.

As a side not, and solely based on observation, certainly uncorroborated, it would appear the government implemented a waiting room. This waiting room served as a controlled access point to limit the number of concurrent users actually exercising the servers. With this waiting room, users were not given the visibility to queue lengths or wait times. Often times, users exited and re-entered the queue during the registration process. This may be some of the reasons for such the low registration rate.

Performance Summary

8+ second wait times
Time outs, errors, etc…
Software designed for 30k concurrent users
Hardware designed for 60k concurrent users
“Waiting Room” had to be implemented

The impact

The inability to register an account
The inability to review coverages
The addition of people to process paper applications (8000 in the first week)
Leading story on all major news outlets
Trending topic on multiple social media outlets

Testing that should have been done

Obviously, with a site of this size and complexity, comprehensive performance testing needed to be done. However, it is apparent from the government’s own reports that adequate performance testing was never performed. In fact, the first end-to-end test occurred only one month prior to the go live date. During this end to end test, they discovered over 200 additional defects and crashed the system multiple times.

It cannot be stated enough, testers need adequate test environments to perform testing. For performance testing, these environments either need to be comparable to production or they need constructed in such a way as to be analogous to production. Not only that, environment stability is fundamental to executing a good performance test.

Security

Most of this section is attributable to the findings of Ben Simo, Inc. can be found on his blog http://blog.isthereaproblemhere.com/

Security of the healthcare website is of particular concern in the testing and the IT security world; however it is the most underreported failure of the website. Fortunately the inability to create accounts is going to prove a blessing in disguise for the government. The sheer volume of security vulnerabilities in the website is truly astounding. In fact there are so many critical security vulnerabilities in the website that it is hard to label just one as the most critical. While I’m not a big fan of using “best practices”, there are certainly bad practices to avoid. The first of those being, storing email addresses and passwords unencrypted within the header traffic of the website. Second, I’d have to say it is sending email addresses and password reset tokens to third parties. And finally, it would be the 471 pieces of identifiable information that they pass back and forth between every webpage on the website including: birthdate, address, name, date of marriage, noncustodial parent information, absent parent information, etc…

Security Summary

These are just some of the most egregious security errors on the website, however there are more. A summary of some of the critical security errors is listed below:

471 identifiable pieces of information stored in the browser for every web page including birth date, address, name, marriage date, non-custodial parent information, absent parent information, etc…
Email addresses and password reset codes to 3rd parties
Using a single password reset code per account versus random generated code for each reset request
Ability to answer security questions by “ex’s” – for example, your favorite radio station
Displaying full stack trace errors to users
Displaying of the username when providing email address input
Displaying of a user's security questions when providing valid user id

The impact

Fortunately there is not been any disclosures of personally identifiable information.
There is a lack of security around personally identifiable information
Users lack of confidence that the government is securing personal information
Inadvertent access information to third parties - however, to date no known abuses have been reported

Testing that should have been done

At a minimum basic security testing should have been performed. However, for site with his much personally identifiable information, as is requested within the healthcare.gov website, more robust testing is called for and should have been done. This includes the typical white hat penetration type of testing plus normal security scans. The targets of such testing should have covered the following potential vulnerabilities:

Injection
Broken authentication and session management
Cross-Site Scripting
Insecure Direct Object References
Security Misconfiguration
Sensitive Data Exposure
Missing Function Level Access Control
Cross-Site Request Forgery
Using Components with Known Vulnerabilities
Unvalidated Redirects and Forwards

Summary

Clearly, the performance and security issues, just on the landing page alone, paint a very scary picture. While I have not met or spoken to any of the testers involved in testing any of the main components of the website it would appear that these groups either failed to perform adequate basic testing in these areas, were prevented from doing so due to project and program level constraints, or raised these issues and they fell on deaf ears. As is often quoted, “if you have time to fix it in production, you had time to fix it in development. “ I certainly hope that the powers to be take stock of the inadequacy of the website and either begins to perform more rigorous testing of the website before any prior rollouts or truly listens to the issues raised by internal teams. As a side note, there is a public debate going on, whether to close the website until it is fixed or allow it to remain while they continue to work on the site. While I am not eager to engage in public debate around the politics of many of the decisions that went into rolling out the website, it is clear that there are enough vulnerabilities that may warrant a bringing down of the site temporarily in order to fix some of these more critical issues and then bring the site back up.

In the next blog post we will examine some of the issues occurring at just the registration layer.

References used as the basis of this post:

*An anonymous user pointed out the spelling and grammar mistakes. I apologize as I used Dragon Dictation to create this post - which was awesome - however, I posted the "pre"-proofread version instead of the corrected version. Thanks for pointing it out.

Tuesday, October 23, 2012

My Lessons Learned from the STPCon 2012 Test Competition

I attended STPCon Fall 2012 in Miami, FL. I was there both as a track speaker and a first time conference attendee. One interesting aspects of the conference, there were others I’ll cover in another blog post, was the testing competition that was available. Matt Huesser, a principal consultant at Excelon Development, arranged and helped judge the competition. A blog of his observations can be found at .

I participated in the competition and have my own thoughts on the competition.

The rules were fairly simple. We had to work in teams of two to five. We had 4 websites we could choose to test and we had a bug logging system to report our bugs. We also had access to stand in product owners. We had 2 hours to test, log bugs, as well as put together a test status report.

My first observation is that it was a competition but it wasn’t. The activity was billed as a “There can be only one” style of competition. However, and more importantly, it was about sharing more than competing. There were competitive aspects to the activity, but the real value was in sharing approaches, insights, and techniques with testers we have never met before. Not enough can be said about the value of peering. Through this exercise, I was able to share a tool, qTrace by QASymphony - for capturing steps to recreate our defects during our exploratory testing sessions, as well as my approach to basic web site security testing. Although we didn’t do pure peering, it is obvious how valuable the peering approach is.

Secondly, a simple planning discussion over testing approach and feedback during testing is immensely valuable as it not only spawns brainstorming, it helps reduce the occurrence of redundant testing. Through this exercise, my cohort, Brian Gerhardt, and sat next to each other and showed each other the defects we found. We also questioned each other on things we had not tried, but were in our realm of coverage. For side by side pseudo peering, this approach worked quite well for us and led to several bugs that we may not have looked for otherwise.

Lastly, I reflected on the competition and there are several observations that I have made as well as one startling curiosity that I think is most important of all. Every single team in the competition failed to do one single task that would have focused the effort, ensured we provided useful information, as well as removed any assumptions over what was important. We failed to ask the stakeholder any of importance regarding what they wanted us to test. We did not ask if certain functions were more important to others, we did not ask about expected process flows, we did not even ask what the business objective of the application was. Suffice to say, we testers have a bad habit of just testing, without direction and often on assumption. I will be posting a blog post more on this topic.

What I did notice is that testers when put under pressure, such as a competition or being time-bound, will fall back on habits. We will apply those oracles that have served us well in the past and work with heuristics that make sense to us. Often times this will produce results that appears to be great, but in the end, they really lead to mediocre testing. If we had taken the time to ask questions, to understand the application and the business behind it, our focus would have been sharper on areas of higher priority, AND, we would have had context for what we were doing and attempting to do.

I will keep this blog post short, the moral of the exercise is simply to ask questions, to seek understanding, to gain context before you begin.

Tuesday, March 6, 2012

Two Testing Giants Part Ways

Wow. What can I say? Scott Barber started a firestorm. Two great minds, James Back and Cem Kaner, in the testing community are parting ways on an idea, context-driven testing (CDT), which they helped create and foster. The reaction on each other’s blog has been like two parents getting divorced with testers in the field expressing disbelief, pain, and sadness.

Part of what is driving this is a changing viewpoint for Cem. Not uncommon since 2 other founding members have already broken off with CDT. To compound things between Cem and James, there is personal animosity between them. While I will say that I have a great deal of respect for James and Cem and they both have great ideas; I, for one, do not really care about them parting ways. And I do not think you care about their parting ways either; because CDT principles are an undeniable truth, not dependent on any one person.

I do agree with Cem’s general statement that there is more than one CDT school – more than one camp each with their own values and ontology (to quote James). The problem is, I disagree with using the word “school”. While it may be semantics, my issue is that school conjures up images of an institution with a predefined curriculum of which you cannot graduate unless you pass the courses. Sounds like certification, with which I disagree. CDT is a paradigm. I don’t like the implied argument, raised by James, that the ability to NOT follow something makes it an approach versus a school; because the implication is, that you must always follow something and that something that must be your identity. This makes CDT sounds like dogma and religion. For the sake of this blog post, I will still reference “schools” as schools to maintain some discussion continuity.

To me, CDT is more fundamental to the testing community than I have found anyone to say. My belief is based on the premise CDT is not about a new way doing things so much as it is an acknowledgement of reality. The CDT principles are more akin to truths than principles. Even if you do not positively use the principles to gain synergies, it does not negate the principles; it does not render them false. CDT is ingrained in the fabric of testing, regardless of which “school” a tester is following. The principles are an acknowledgement of “what is” not of “what can be”. To see what I mean, just paraphrase a few enlightened guys, “I hold these truths to be self-evident…” Even if someone followed the “Factory School” of testing, the CDT principles still hold true. For example, Factory testers believe testing measures progress. While that is an information point, it does not negate CDT principles such as projects unfold in unpredictable ways or that people are still the most important asset. The principles of CDT can be found in these realities. Therefore, CDT fundamentally permeates all “schools” of testing. Because of these reasons, I do not see any significance in this parting of ways.

Ultimately, there is an underlying “thing” that needs to be acknowledged, and that is: the purpose of software testing and consequently, software testers. Software testing’s purpose is to identify data, consolidate it into useful information, and provide it to stakeholders so that informed decisions can be made. That purpose exists regardless of a “school”. When thinking about it, I see the “schools” are really aligned to tools and types of information for specific decisions. The schools are not aligned to the real purpose of testing. I believe this is one of the reasons there are so many issues with every “school’s” beliefs and approaches.

Our job, as testers, is to extract data, synthesize it into information so that someone, a stakeholder, can make a decision. To do this we need tools. Those tools depend on the several factors:
I can only use tools I recognize – it is difficult to use something as a tool if you cannot see it in front of you Everyone has different tools – our tools are experience and knowledge based, accelerated - at times - but never replaced by technology
Not every tool we know of is at our disposal – there may be great tools out there, ones we know about, but we simply might not be able to afford or capable of easily learning them
Every tool has a function – no matter the tool, it has a purpose, a way of being used, and expectation of what using it will do
Every tool can be used for any job, with varying degrees of success – you can use a butter knife as a regular screwdriver - sometimes
Using tools, in both traditional and non-traditional ways, will create new tools for us – it is about learning. With all tools being knowledge based, any learning leads to new tools

Our challenge is to realize, as testers, our job is to extract data about a creative process in order to synthesize useful, information in a way so that stakeholders can make informed decisions; to use the tools we have available to do the best job we can. By the way, that translation of data to useful information is and should be influenced by the creative process (development process), our tools, knowledge, experience, and our understandings. We are researchers, inspectors, philosophers, teachers, learners, synthesizers, but most of all, we are information brokers.

Friday, July 17, 2009

Counter to "No Spec=Waste of Time"

What’s a QA team without a spec? A goddamned nuisance and a waste of time, that’s what.

In this article the author rants about Elder Games' Asheron’s Call 2 (an MMO) defect issues more than anything else. The author's opening sentence points to his emotional bias in the succeeding thoughts. Pretty powerful emotional stuff there. Distilling his gripes, I find he takes issue with:

1) Low Priority Bugs in the Bug Tracking System (noise)
2) Too many severe bugs released to production
3) Not enough time for the volume of work

The rest of his post is simply an attempt at assigning causation for those issues. It is his causation analysis that I take issue with. A lack of formal written specification does not result in a higher volume of "noise" in the bug tracking system. Testers innately have other references or oracles by which to evaluate software. Those can be prior experience, technology experience, genre experience, etc... Of course any material, even email, can serve as a reference point. In systems where the oracles used by the testing team are nearly universal to both the team and the software - more common in simple systems, very little documentation would be needed to have a successful test effort, with normal noise levels. In systems - more common in complex systems, where oracles are not universal, you will see more noise in the bug tracking system. It isn't the lack of documentation that is a problem; it is a lack of universally acceptable oracles. There are ways to achieve that without volumes of documents, such as collaboration, pair testing with another developer in the group. In the end these are just ways to communicate and agree on a common oracle; after all, documentation is just a proxy for, or the remembered result of, a discussion. The failure to recognize the gap in common oracles may result in increased noise in the bug tracking system and a reduction in severe bugs being caught.

Another aspect "noise" in the bug tracking system that wasn't addressed is the all too common problem of not monitoring what is being logged. All too often, testers log bugs and they don't get reviewed by anyone until some coordinated meeting. The span between the meetings represents the window of opportunity to log bugs where monitoring does not occur (this does not occur in all places and/or is not all that severe everywhere). There are two aspects I would like to address:

1) Approval signals importance - Testers log bugs they find important, again based on their own oracles. If it wasn't important to them, they would never see it as a bug. Importance must be defined in terms of value judgment and not severity. Approval in defined in terms of consensus acceptance and not a formal status assignment. Agreement by any stakeholder that a logged item is a bug signals to the team, "Hey, go ahead and find more of those because we like them." I believe this because it is societal behavior to fear rejection and pursue acceptance. So, approved bugs breeds more similar bugs. If a strong severity/priority system is NOT used, then testers can be led to believe some types of bugs are more significant/valuable than others. Without this correction and in combination with approval it is easy to establish conditions that magnify noise in the system.

2) Number of bugs - Bugs logged into a bug tracking system form another oracle for a tester, see #1. Because of that, the value of those bugs tends to decrease as more bugs are added to the system. In the face of numerous, especially uncategorized and unchanged status, bugs testers tend to skim the list for cursory information instead of diving deep into them to understand what has and has not been covered and uncovered. Therefore, a growing volume of unmanaged bugs degrades the value transferred to the testers by having the repository in the first place.

Finally, the issue of too much work and not enough time is simply a reality of software development. There are numerous estimation models, development methodologies, tools, etc... that all center around dealing with this specific issue of how to get more done with less money, time, and resources. Just because this condition exists doesn't mean there aren't ways to still achieve a solid development and testing effort. At the end of the day, if you don't communicate, and agree on common oracles, you will always incur more work to overcome this obstacle; because, software development is always collaborative no matter how hard you try to fight it.

Monday, July 6, 2009

Automation Pitfalls Snarls an Automation Vendor

I like automation testing tools. I like the challenge, I like the hobbyist programmer it brings out in me, and I like to use them, but only when they are useful. Automated tests are not sentient. They are not capable of subjective evaluations, only binary ones. Not only that, they are only able to monitor what they are told to monitor. These facts are often overlooked in the name of faster, better, higher quality testing. See the advertising quote below:

Benefits include ease of training (any member of your team can become proficient in test automation within a few days), staffing flexibility, rapid generation of automated test suites, quick maintenance of test suites, automated generation of test case documentation, faster time to market, lower total cost of quality, greater test coverage, and higher software quality.

This is the advertising from a tool I ran across today called SmarteScript, by SmarteSoft. I saw their ad, and downloaded a demo. The first thing I did was attempt to run through the tutorial to become acquainted with how the tool operates. Like most commercial for-profit tools, they have a rudimentary demo site that this designed to show off their product's bells and whistles. The basic tutorial was to learn the web-based objects (textboxes, buttons, etc...). Then, with their nifty excel like grid, add in some values for appropriate objects. Pretty simple. I used their tool to learn a dropdown box on their demo site per their instructions. *Emphasis added because it is possible to learn a table cell around the dropdown, the dropdown arrow graphic, or the text only portion of the dropdown list without learning the dropdown list as required. They had specific instructions on how to do this as well as how to tell if you screwed up. However, when going through the tutorial, with IE8, I noticed that it appeared to only learn the dropdown as static text (ie a label). I tried and tried to get it to recognize the dropdown as a dropdown, but it would not. I tried playing the script back just to see if the tool would overcome its own learning. But alas, it failed. I sent my observation to the support team. To their credit, I received an email back the same day stating they were able to reproduce the problem and their development team would look into it further.

I find it ironic that a vendor, selling this as a way to achieve faster time to market, lower total cost of quality, greater test coverage, and higher software quality ended up releasing a tool that is the contrary of that. I would be remiss if I didn't point out that I doubt I can be proficient at using a tool in a few days when it doesn't work on their own demo site; so strike down another claim. Most importantly I think this points out that test automation is not a silver bullet...nor a cheap one.

Thursday, July 2, 2009

Redefining Defects

I recently did a guest blog post about writing defect reports that include a value proposition; which is just another way of stating an impact of the defect. When writing the post, one thing occurred to me. The term defect is not exactly correct. Some are defects, some are misunderstandings, and some are suggestions. To add to the issue, Cem Kaner, recognized the legal implications of using the term defect (slide 23)

So, what exactly, should defects be called: bugs, defects, issues, problems, erratum, glitches, noncompliance items, software performance reports (spr), events, etc…? To understand what they should be called, we need to understand what “they” are.

The primary purpose of defects is to point out what we, as testers, believe to be notable information about the application. Typically, a defect report is about something that doesn’t work the way we think it should; but, it can also be a suggested improvement. As always, there is an explicit or implicit reference against which the defect is judged and evaluated. For software, it can be a requirements document, UI standards, historical experience, etc.

What is an observation? There are many dictionary definitions, but for our discussion, let us use Merriam Webster’s third entry which is “an act of recognizing and noting a fact or occurrence often involving measurement with instruments”. The Free Dictionary Online has a similar definition of observation (second) as being “a detailed examination of something for analysis, diagnosis, or interpretation.” Isn’t that what we do as testers? We perform a series of actions to elicit a response and compare that response against a set of expectation? We then definitely log anything we consider as out of line with our expectations? I propose that is exactly what we do in the testing field. Therefore, defects are really observations. Perhaps we should start calling them so. It eliminates negative sounding words, eliminates legal concerns and, most of all, it better matches our actions. I propose we call them observations. Thoughts?

Thursday, June 18, 2009

Certification - A money maker, but not for you

I received this advertisement via email. It is funny, in their misconceptions, but it is more sad that so many people buy into it (and by buy, I mean spend $$$).
Okay, so let's

start with the second line of this advertisement,

"If your team is conducting ad hoc, informal tests with little guidance or planning, the quality of the end product can be severely jeopardized—negatively affecting your bottom line"

This nearly represents everything that is wrong with the ISTQB certification. The quality of the end product is not

jeopardized with informal testing, a lack of test planning, or a lack of guidance. In reality, quality is a relationship, the simplest of which being the value of the product to the stakeholders (those who matter).
But this next statement in their advertisement does represents everything that is wrong with the ISTQB certification

"The best way to be certain that you are providing customers with quality software is to make sure your team of testers is certified."
Really? I thought it was by providing something they value, usually something built to meet their wants and/or needs? If I all need to do is put a gold embossed sticker on it, then so be it. Here you go.

All your software is now of high quality. Oh by the way ISTQB, that will be $1,995 + $250 ($1,995 for training so that you know how to use the sticker, and $250 for the right to use the sticker. Let's throw in $9.95 for Shipping and Handling as well). So, they don't have a clue about the testing industry, nor about quality. But hey, let's see what their mission statement says. Maybe that will shed some light on what they are trying to do.

It is the ISTQB's role to support a single, universally accepted, international qualification scheme, aimed at software and system testing professionals, by providing the core syllabi and by setting guidelines for accreditation and examination for national boards

So, their mission is to create a certification scheme (their word, not mine) and provide the materials and exams for that certification. It is nothing less than a money making scheme, and it is right there in their mission statement. They do not care about quality or testing.

I do not accept the ISTQB as a part of any community of software testers. The ISTQB is a business, pursuing their own agenda. I know that may sound a bit harsh, but consider this. As Michael Bolton pointed out, In Oct 2008, ISTQB announced 100,000 certified testers. Each of these testers had to pay a fee to take the exam. For the U.S, this fee is $250 (entry level) and I think $100 in India. That means they have made between $10 million to $25 million in revenue on certifications alone in the past 5 years. So far, they are succeeding at their mission statement.