I made some artificial files based on real data. Some files are super mixed together, other files are straight cuts.
So, an example of a very naive test file, just for the sake of interest, was when I combined two species of bacteria together at around 50% of one of their contigs. The GC Content chart then looked something like this:
Kinda out there!
Was this what I expected? Well, I did expect that there would be a noticable shift like that, but the weird thing is how the standard deviation acts. I think this is just my lack of understanding, but looking at that chart you can SEE where the huge difference lies, but since the mean is just across the middle, the standard deviation threshold only gets a bit mad when things are a fair bit above/below the mean.
This tells me that it’d be good to have another measure, to look for dramatic shifts in GC Content percentage like this. I’m not sure I have the time to do this right now though, so I’m jotting it down here as note of something to do if I get more time after the bulk of my documentation/as something I’d do as a future task.