However, the 3 of us that put together the upgrade all still managed to completely overlook this problem due to the stress of trying to fix the malleability security issue as fast as possible. The moral of the story is--security issue or not--we need a better process for live upgrades:
We are reviewing our options, but the conclusion is clear. DevShares needs to come out as soon as possible because it is the most likely way we would have caught this bug.
This fix was rushed out because once the existence of the security flaw was publicly disclosed, we felt under extreme time pressure because any technically competent attacker reading our public discussions would have been able to use the unpatched flaw to do serious damage.
We need to do a better job of telling all internal developers and community contributors not to discuss security vulnerabilities over public channels until a fix is deployed. A little more caution along these lines would quite possibly have resulted in a patch developed over a longer time frame, with more thorough testing, under less pressure.
The next time a security vulnerability is disclosed before a patch is available, we need to consider more carefully the risks of a botched fix when deciding how much time to spend writing / testing a patch. We don't want the fix to cause more damage than the exploit.
We also need to address technical debt in the testing side of things. DevShares will help; so will more tests and improvements to our testing infrastructure. I've been working on the latter.
I think there's little value in assigning blame to particular people. I'm pretty sure that those who screwed up (myself included) have realized it and have figured out what they need to do differently next time. Overall I'm not even sure we can attribute human error as the root cause, rather it's lack of a well-defined process for dealing with security vulnerabilities. Also, a process for code review and testing of release candidates would be useful.