Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
The Courts Electronic Frontier Foundation

Accused Murderer Wins Right To Check Source Code of DNA Testing Kit (theregister.com) 167

"A New Jersey appeals court has ruled that a man accused of murder is entitled to review proprietary genetic testing software to challenge evidence presented against him," reports The Register.

Long-time Slashdot reader couchslug shared their report: The maker of the software, Cybergenetics, has insisted in lower court proceedings that the program's source code is a trade secret. The co-founder of the company, Mark Perlin, is said to have argued against source code analysis by claiming that the program, consisting of 170,000 lines of MATLAB code, is so dense it would take eight and a half years to review at a rate of ten lines an hour. The company offered the defense access under tightly controlled conditions outlined in a non-disclosure agreement, which included accepting a $1m liability fine in the event code details leaked. But the defense team objected to the conditions, which they argued would hinder their evaluation and would deter any expert witness from participating...

Those arguing on behalf of the defense cited past problems with other genetic testing software such as STRmix and FST (Forensic Statistical Tool). Defense expert witnesses Mats Heimdahl and Jeanna Matthews, for example, said that STRmix had 13 coding errors that affected 60 criminal cases, errors not revealed until a source code review. They also pointed out, as the appeals court ruling describes, how an FST source code review "uncovered that a 'secret function...was present in the software, tending to overestimate the likelihood of guilt.'"

EFF activists have already filed briefs in multiple courts "warning of the danger of secret software being used to convict criminal defendants," reports an EFF blog post.

"No one should be imprisoned or executed based on secret evidence that cannot be fairly evaluated for its reliability, and the ruling in this case will help prevent that injustice."
This discussion has been archived. No new comments can be posted.

Accused Murderer Wins Right To Check Source Code of DNA Testing Kit

Comments Filter:
  • Good (Score:5, Insightful)

    by geekmux ( 1040042 ) on Saturday February 06, 2021 @06:42PM (#61035436)

    "No one should be imprisoned or executed based on secret evidence that cannot be fairly evaluated for its reliability, and the ruling in this case will help prevent that injustice."

    It doesn't matter how good your are, or how rich you are. This, is a good thing.

    "...argued against source code analysis by claiming that the program, consisting of 170,000 lines of MATLAB code, is so dense it would take eight and a half years to review at a rate of ten lines an hour."

    Code complexity, is a bullshit defense. Protect your assets and liability, yes. But stand by your damn code.

    • The other reason it's a bullshit defense is they're pretending the way to find flaws in code is to read through it linearly scrutinizing each line for 6 minutes. There are strategies and tools to identify problem spots, and test cases to be written.

      • Im not familiar with Matlab but i wonder why anyone would use such a system with its very limited developer tools (compared to say java) to write such a large app. It almost sounds like the provider themelves dont know what their own code does.
        • In my limited experience, ie. what I've gathered during the years, MATLAB is surprisingly often to be what is used to start a prototype/research type program which may then become an entrenched in the product. My guess would be that MATLAB is used because it was used in studying times and therefore it's familiar. Then it's built upon, more layers upon more layers and then after twenty years it's 170,000 lines of code. Then it's a lot of effort to change the programming language.

          Cybergenetics says [cybgen.com] about M

          • There are many ways to attack a software problem.

            Im not familiar with DNA sequences, but 170k is a lot of logic, i would have thought finding patterns while complex is generally recursive and repeated attempts at finding sequences. Shouldnt the code be significantly less statements thatn 170k worth ?

            Going to make another statement, but something of that size with a limited or platform not designed with structure and tooling sounds like a plain dumb choice.
          • I worked at a startup where this was exactly the case. Started by a professor who came up with a very neat solution to an image processing task. Anything algorithmically complex was written by him in MATLAB, translated to C through coder and compiled, but development continued in MATLAB. His stuff was called from some C programs on a linux box. The GUI was written in Java and ran on a laptop or tablet.

            I actually inherited something similar at this startup. We were working with another researcher who gave us

        • Re: (Score:3, Informative)

          Im not familiar with Matlab but i wonder why anyone would use such a system with its very limited developer tools (compared to say java) to write such a large app.

          Depends on what you're developing. I'm going to neither defend nor criticise MATLAB per-se. What it's built for is doing numerical and/or scientific code aimed at people whose expertise is the analysis, not programming and it has a ton of tools for that. It has a lot of the analysis functions built in, and the MathWorks do care about recurring cus

          • > Depends on what you're developing. I'm going to neither defend nor criticise MATLAB per-se. What it's built for is doing numerical and/or scientific code aimed at people whose expertise is the analysis, not programming and it has a ton of tools for that.

            I think you just agreed with me, if its a bad development platform then it must have quality issues.

            > So if your expertise is the scientific side, not general software engineering, it fills a pretty good niche by isolating you from all those de
        • by teg ( 97890 )

          Im not familiar with Matlab but i wonder why anyone would use such a system with its very limited developer tools (compared to say java) to write such a large app. It almost sounds like the provider themelves dont know what their own code does.

          Because Matlab is really good at maths, statistics, etc - which is the major part of what they are doing here. For some examples, look at a basic tutorial [mathworks.com]. There is an open source tool that is very similar in many ways - GNU Octave [gnu.org].

        • I would put money on that bet. Big money.

          For the last fifteen years of my career my job was to take systems that no one knew how they worked and back-figure the design.

          From experience I can tell you that you don't go in and examine every line.
      • Plus, the defendant shouldn't even have to show the code is flawed.

        The burden of proof is on the prosecution to show that it is NOT flawed.

        If the code is opaque crappy spaghetti code (as the prosecution seems to be claiming), that, in itself, should mean the conviction should be overturned.

        A production system implemented in MATLAB?!!?! Oh, God.

    • A code review is not convincing proof. Anybody can look through a codebase and find lots of things that violate best practices and make it sound horrible.

      The best kind of proof is making accurate predictions. If this tool can repeatedly pick a needle out of a haystack with high precision and recall, then it does work. Bottom line.

      • by cas2000 ( 148703 )

        here's my algorithm:

        for each item in haystack:
                accuse item of being a needle
                send item to needle jail

        this will always find the needle if there is one to be found. every time. guaranteed. so, it "does work. bottom line."

        who gives a shit about false-positives, anyway?

      • Re:Good (Score:4, Insightful)

        by dryeo ( 100693 ) on Sunday February 07, 2021 @12:40AM (#61036306)

        A magnet will pick all the needles out of a hay stack, both guilty and innocent, and if they stop at the first needle, out of many, well it looks like they can pick the needle out of the hay stack.
        So the question then becomes, how many needles are/were there in the haystack.
        One way to test, and how it is done in my country for things like blood tests for DUI, is to let the defendant use a different independent testing agency.

        • A magnet will pick all the needles out of a hay stack, both guilty and innocent, and if they stop at the first needle, out of many, well it looks like they can pick the needle out of the hay stack.

          And how does a casual observer tell the difference between needles and hay? If the badly coded magnet also collects hay it looks like it's working too.

          • If it picks up everything it's probably broken. Usually when they run DNA tests it's against a huge database (and now that people have done all of those ancestry tests it's not always just a criminal database) and get a limited number of results. It really isn't hard to determine what percentage of the sample population ends up matching.
    • Re:Good (Score:5, Insightful)

      by trawg ( 308495 ) on Sunday February 07, 2021 @01:14AM (#61036354) Homepage

      Code complexity, is a bullshit defense. Protect your assets and liability, yes. But stand by your damn code.

      Someone telling me their code is complex and dense and thus hard to understand is basically telling me that it is almost certainly unreliable and noone really knows what it's doing in a lot of places

      • "Someone telling me their code is complex and dense and thus hard to understand is basically telling me that it is almost certainly unreliable and noone really knows what it's doing in a lot of places."

        You realize this is the same stupid opinion used to invalidate everything scientific by those unqualified to judge it, right?

        "If I can't understand it, it must not be rightt." Well, what if the code is really complex, dense, and hard because the problem itself is complex, dense, and hard?

        Yeah, yeah, yeah. The

        • "If I can't understand it, it must not be rightt."

          No. That is not what the GPP said at all.

          If a scientist claims that X is true, saying that I don't understand it is not a valid counter-argument.

          Showing that THE SCIENTIST doesn't understand it, is certainly a valid counter-argument.

          If the people who designed the device don't understand how it works, then we shouldn't be using it to send people to prison.

        • "If I can't understand it, it must not be rightt." Well, what if the code is really complex, dense, and hard because the problem itself is complex, dense, and hard?

          You do what this defense would do, get an expert to look at it. Your bogus argument hinges on the actual defendant doing the examining. He wouldn't be. It would be an expert.

      • Including themselves. Their rejection of your looking is just a flag that they know damn well they can't adequately explain how their code is performing the function.
    • Code complexity, is a bullshit defense. Protect your assets and liability, yes. But stand by your damn code.

      The question is how the f*ck did they manage to reach such code complexity in the first place. Genetic analysis is Bayes. In fact, it was the first major applicaiton of Bayes statistics before insurance, financial markets and everything else. It is not particularly complex and I would really like to know how can one manage to belch 78K lines of Matlab on what is generally expressed in less than 6K (including data handling).

      Further to this, the tool should be inadmissible as a tool to start off with. In th

      • by hankwang ( 413283 ) on Sunday February 07, 2021 @05:19AM (#61036634) Homepage

        Well, it is Matlab after all. Matlab discourages you to refactor code, both because of a performance penalty with function calls and because each function needs to be in a separate file, sharing the same namespace with thousands of built-in functions.

        The Matlab language and builtin functions have bizarre ways of handling both edge cases and reasonable use cases:

        * Spaces act as commas: [1, 2 -3 + 4] is a 3-element array.
        * If a function returns an array, you cannot index the return value without an temporary variable: tmp=foofunc(123); a=tmp(1). Not a=foofunc(123)(1).
        * The character ' is both a string delimiter and a unary conjugate transpose operation.
        * i and j are predefined as the imaginary constant, but you can use them as a loop variable. Printing a complex variable can result in '1 + i', but don't try to copy that into a variable initialization.
        * If d=[2, 3, 4] then zeros(d) will initialize a 3D array. But if d=2 or d=[2], then you'll get a 2x2 array. Code that needs to handle input data with unknown dimensions must be sprinkled with if statements to check whether it is the special case of a 1D array.
        * A string a='xyz\n' has a backslash. Only some string functions will convert it to newline.
        * Operations involving floats and ints will demote the result to the lower precision datatype. int8(25)*10.0 == 127 WTF.
        * Functions fail silently. If you save data in a format that allows 2 GB per record and you try more, you end up with files having missing records and a warning on stdout. Great fun in batch jobs that take 24 hours to finish and that work fine with small test datasets. The workaround made me run into the next one.
        * The only built-in way to write data to a file can only write records with unique names matching variables in the present scope. There is no way to write an array row to a file without first assigning it to a variable.
        * Though Matlab is prized for vectorized handling of array data, it cannot do linear operations on high-dimensional arrays; where in numpy you'd use c=np.einsum('ijkl,kij->il', a, b), you'll spend 5 lines of code reordering the array layout or writing out an explicit nested loop.

        I sometimes have to deal with legacy Matlab code and each and every time I'm screaming "WTF you can't be serious" as I discover another quirk like these.

        • After reading this, the stratospheric rise in the popularity of tensorflow starts making much more sense :)
        • by serviscope_minor ( 664417 ) on Sunday February 07, 2021 @07:06AM (#61036788) Journal

          Matlab has some quirks, not gonna lie.

          The main problem though is that the code authors are generally experts in their domain of something scientific or engineering related, not programming. Mostly it's used by people who self-taught or had a couple of courses then self taught the all they needed to know to do their science, which can get quite large and complex. Such people generally don't have any professional software experience from the point of view of software engineering.

          If you do have the right background then you can write good matlab code. It's very rare to find someone who can (a) write actual production quality code and (b) has the experience and knowledge to do so to solve some niche scientific problem. And those people are often extremely expensive because it's such a rare combined skillset.

      • by AmiMoJo ( 196126 )

        Can't the defence have their own DNA analysis done?

        In the UK people convicted on fingerprint evidence have been subsequently found innocent by getting their own experts to do an analysis saying they do not match, contradicting the prosecution's expert.

        • by teg ( 97890 )

          Can't the defence have their own DNA analysis done?

          In the UK people convicted on fingerprint evidence have been subsequently found innocent by getting their own experts to do an analysis saying they do not match, contradicting the prosecution's expert.

          Matching DNA if you have a good quality sample is easy, and gives a high quality match. Of course, there are always weird counterexamples [scientificamerican.com]. Tools like this are used when you don't have high quality samples - and you end up with a lot of statistics for making things more or less likely. An interesting article on this [nih.gov].

          • by AmiMoJo ( 196126 )

            Cases have collapsed here because of dodgy techniques like "amplification" producing results that are unreliable. In one case it emerged in court that the match pointed to at least 5 other people in the police database alone.

      • by gTsiros ( 205624 )

        you have no idea do you, of the levels of shittiness in code, both in quantity and quality

        i have seen a single function that is on the order of tens of thousands of lines of code

        which does "serialize object to xml"

        by hand.

  • and what about 3rd party code they can't give out they have there own NDA that says they can't do this.

    • Then the prosecution shouldn’t have built their case on evidence that is either inadmissible or indefensible. Relying on such software is like relying on a process for managing evidence that doesn’t maintain a proper chain of custody: you just don’t do it.

    • All NDAs I've ever seen include a clause allowing for the release of information if a court orders it. There probably is another clause that requires the third party to be notified and allowing that third party to make its own case to the court that they shouldn't have to release it. But the courts definitely override NDAs.

    • Then that evidence is thrown OUT. If you sign such an NDA and insist on keeping it, you are declaring that your software will NEVER be used in a court of law.

      The rights of the accused outweigh ALL business interests. You can not say "I have proof that he is guilty, but I can't show it to you because I signed an agreement."

      The Constition says you can confront your accuser/witnesses. This obviously includes software that analyzes evidence, otherwise corrupt people will write some software for DNA tests w

  • Self driving cars need this in case there is an criminal case and the car owner / driver?? is facing changes.

    Also the Uber self-driving car operator Needs to demand ALL source code, logs, even the code of the car that uber modded.

  • by aberglas ( 991072 ) on Saturday February 06, 2021 @06:51PM (#61035462)

    Surely the issues are with the sample collection, preparation, and chemical analysis.

    What comes out is a pile of data. The defense should be able to check that with different software. If the results don't match then why needs to be investigated.

    But it is a concern when a vast amount of MatLab code is required that would take "eight and a half years" to review -- how long did it take to write? And how do you unit test such code?

    Or better, test the entire system. Put lots of samples from one end to the other, with the source not known to the testers, and see how they go.

    • by HiThere ( 15173 )

      That's probably a good approach, but the source code of any software used to make decisions by the justice system should be open to examination. Period. No excuses. And no reason needing to be given.

    • Or better, test the entire system. Put lots of samples from one end to the other, with the source not known to the testers, and see how they go.

      That makes a lot of sense if you're looking for the truth, that's not the point of an adversarial legal system

      i.e., The defense wouldn't want to actually do that because, more likely than not, the test cases would work properly (even if there is some issue with the code, which I suspect there is, especially considering how seemingly unnecessarily voluminous it is, i

    • by Kludge ( 13653 )

      This could be a software problem because DNA sequencing requires assembling and matching of smaller DNA sequences. To a certain degree DNA matching is a probabilistic task, and different software can yield different results.

      • by pjt33 ( 739471 )

        Not to mention the model which looks at the assembled matches and generates a likelihood ratio for the two samples being from the same person vs not. I once worked on a project to implement a frontend for some fingerprint matching analysis written by an academic: it was impossible to test properly because the backend analysis had a Monte Carlo component and the LR could vary by an order of magnitude between submissions of identical data. As you might imagine, I'm now rather cynical about forensic science.

  • In DUI case that this happened & the code sucked with an lot of bugs and I think they got off.

  • Outrageous (Score:5, Insightful)

    by sgage ( 109086 ) on Saturday February 06, 2021 @06:59PM (#61035480)

    The notion that a decision to put someone away, possibly executed, for murder, being entrusted to closed-source software is outrageous on the face of it. And the defense that 'it's a trade secret' is doubly outrageous.

  • by joe_frisch ( 1366229 ) on Saturday February 06, 2021 @07:07PM (#61035516)
    The impact of a failure of this software is very large. Is there a qualification process? What is the manufacturer's liability if the code is found to be flawed in a way that results in a wrongful conviction? In a sensible wold they would be given 10K samples to test for false positives and negatives. This would include tests to see if the false positive rate was higher for people of the same ethnicity, especially in groups with limited genetic diversity.
  • Let's see. Oh, hell, I don't know, perhaps every voting machine? Not just the source code but also a complete build procedure including the entire toolchain so you could compare the binary with what's actually installed on the machine to ensure that there are no shenanigans going on. What? You don't like that? You think everyone should just trust it? What are you afraid of? If you have nothing to hide, you should have no problem with this. That goes for hockey stick boy's climate data and methodologie

    • Let's see. Oh, hell, I don't know, perhaps every voting machine?

      What if they could verify the actual results not the software.

      Which is exactly what happened with voting. They were verified by hand recounts. Now fuck off.

  • by Jodka ( 520060 ) on Saturday February 06, 2021 @07:43PM (#61035616)

    The co-founder of the company [Cybergenetics], Mark Perlin, is said to have argued against source code analysis by claiming that the program, consisting of 170,000 lines of MATLAB code, is so dense it would take eight and a half years to review at a rate of ten lines an hour.

    Consider the complete idiocy of convicting someone according to evidence generated by software too complex to be audited for correctness.

    The company president's own statements are admission that there can be no confidence in the correctness of its software.

    That alone should result in exoneration. "Should result", not in the sense that it is an expected outcome according to conventional judicial practice, but in the hypothetical sense of if the courts actually worked according to justice and reason.

  • by ceoyoyo ( 59147 ) on Saturday February 06, 2021 @08:52PM (#61035822)

    170,000 lines of MATLAB code

    !?

    That much Matlab code means it's probably cobbled together bits and pieces from a hundred grad students who just needed something to work so they could write their thesis.

    I once worked in a lab where the prof owned a company that did clinical trials. He was showing me some results from a particular trial and saying "the therapy is great! Every single patient got better!"

    *Every* patient? 100% response is a giant red flag. A little digging turned up two things: 1) the code they were using for the analysis was written by a student learning to program, and 2) it discarded the sign of all the results. I recommended that they hire actual software engineers and developers to write, and test, their critical software, rather than using some grad student's learn-to-code-in-Perl project.

    • I can't imagine 170,000 lines of matlab being the right answer to any problem. Don't get me wrong, Matlab is a fantastic tool for a wide range of problems and I use it extensively - but it doesn't feel designed for very large projects developed by large numbers of people. It takes real effort to debug complex code - its all to easy to think its good when it gives you the answer you want.
    • Well, everything that has to do with statistics should be tested against NIST test data [nist.gov]. You will be [surprised|horrified] by results. Mosts software do not score well, with Excel/OpenOffice showing up as a numerical joke...but Matlab isn't that better.
  • Why not rely on copyright to protect their code? Are they worried about pirated versions being bought by *checks notes* the government? Is DNA testing some mysterous science? Or is it far more likely that they don't want to release the code because it is either shit or actually infringes on someone else's IP.

  • Seem obvious (Score:5, Interesting)

    by clambake ( 37702 ) on Saturday February 06, 2021 @10:00PM (#61035994) Homepage

    If you can't check the code then the defendant can present his own proprietary codebase that the prosecution isn't allowed to see which completely exonerates him.

  • It cannot be both.

    The whole definition of evidence is, that you reveal information that was previously not known or secret.

    Any self-respecting judge should ask ...

    Judge: This just says 'guilty'. Can you back up how you came to that conclusion?
    Company: We do not share our code with you!
    Judge: Then you ain't got evidence!
    Company: But you are supposed to trust our judgement!
    Judge: *Who's the judge here, motherfuckers?!*

    • Has Sam L. Jackson ever played a judge?
      I want to see that.
      I mean a courtroom judge, not like Dread or something.

  • I have no idea how they managed to bloat the code so much. DNA analysis is fairly simple, you just need to align small DNA segments. It can be done with a couple hundred lines of code in Python. I would guess a couple thousand lines of code in Matlab would be enough for the most rigorous check.
  • If you don't like the results of a DNA test, just get a second opinion. In fact, for criminal cases, there should be automatic multiple tests from different services. The services can be selected at random from a list of same, including services in Europe and China. Double-blind tests and best 3-out-of-4 wins.

  • Its obvious tey dont want to share their source, because its full of russian malware and trojans.

Understanding is always the understanding of a smaller problem in relation to a bigger problem. -- P.D. Ouspensky

Working...