Processing a huge file

I went to NCBI and found a large file. A file I thought my application might struggle with. It took time, but it processed! I imagine if this application were to actually be used, it should be hosted somewhere with a large amount of memory to be able to get through these sequences. Anyway, look, here’s some screenshots of what happened.

The contig itself is from:

In particular, Campylobacter coli strain CVM N287 N287_contig_8, which is 207270 characters long.
I’m still working on ‘Superframe’, but you can see how most of the GC Content percentage regions that were outside of the mean threshold are all within the ORF Locations, which is exactly what I hoped to see.

hugefileprocessedfullsequence1 hugefileprocessedfullsequence2
hugefileprocessedfullsequence3 hugefileprocessedfullsequence4

The next step is to find another few contigs, smaller in size for time sake, run them individually to see that their GC content % regions match with their ORF Locations, and then start mixing them up together where I know there should be differences in the GC Content % and see if I can view this after processing the mixed contig data.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s