Discussion:
Format codes for comma separated data
(too old to reply)
Brian Larsen
2007-12-20 20:05:24 UTC
Permalink
Hello all,

I must be slow today but I can't figure this out. I have huge ascii
files that are all comma separated. I am currently reading them in
line by line and splitting them with strsplit(), this solution worked
but now I have files that have a line every 30s for 13 years and I am
not that patient.

I found this really old post saying you don't need to worry about the
commas but it doesn't seem to work for this data, I think the reason
is that some are strings and others floats.
http://groups.google.com/group/comp.lang.idl-pvwave/browse_frm/thread/960754afec3a8169/efd7a2182fe40447?lnk=gst&q=format+with+commas#efd7a2182fe40447

What I normally do is create a struct array and read a file into that
with each column having a structure tag so I can stay organized. But
I can't figure out the format codes to do that this time...

The data looks like this:
126489604,1996-01-04T00:00:04 ,01/04/96 00:00:04, 19061, 23,
2, 3, 7.009E+03, 3.536E+02, 6.361E+01, 6.481E+02, -3.545E+02,
3.095E+03, 6.279E+03, -3.162E+00, 6.081E+00, -3.085E+00,
-9.772E-01, 2.088E-01, 3.757E-02, -2.047E-01, -8.816E-01,
-4.254E-01, -5.570E-02, -4.234E-01, 9.042E-01, 6.150E+02,
-4.040E-02, 2.000E+00, 7.046E+03, 7.994E+01, 6.333E+01,
1.819E-01, 5.307E+00, 3.935E-01, 1.452E-01, 6.427E+01, 6.665E
+01, 6.318E+01, 8.319E+01, 4.845E-02, -2.560E-01, -2.949E-01,
-3.796E-01, -1.017E-01, -1.900E-02, -1.785E-02, 5.315E-02,
-2.967E-01, -4.012E+02, 2.855E+02, 1.946E+02, -1.058E+01, -7.476E
+01, 7.141E+03, 6.264E+01, 1.304E+02, 4.168E+01, ...

where the ... is about 1/2 way down the 134 columns in each line.
I tried the obvious:
IDL> dat = strarr(134,3)
IDL> readf, lun, in, dat
and
IDL> dat = create_struct('d1','','d2','', 'd3','', 'orbit', 0, 'num',
0)
IDL> readf, lun, in,
dat

But the whole line ends up in each string element but the integers
seem right...

How do I need to think about this differently?


Cheers,

Brian

--------------------------------------------------------------------------
Brian Larsen
Boston University
Center for Space Physics
David Fanning
2007-12-20 20:13:30 UTC
Permalink
Post by Brian Larsen
What I normally do is create a struct array and read a file into that
with each column having a structure tag so I can stay organized. But
I can't figure out the format codes to do that this time...
Why not? Because they vary from line to line?

Cheers,

David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")
Brian Larsen
2007-12-20 20:17:26 UTC
Permalink
Post by David Fanning
Post by Brian Larsen
What I normally do is create a struct array and read a file into that
with each column having a structure tag so I can stay organized. But
I can't figure out the format codes to do that this time...
Why not? Because they vary from line to line?
No the fields are the same line to line, readf is not breaking up the
line.

When you do
IDL> dat = strarr(134,3)
IDL> readf, lun, in, dat
you dont get one number per array element, you get one line per
element.





Brian

--------------------------------------------------------------------------
Brian Larsen
Boston University
Center for Space Physics
Brian Larsen
2007-12-20 20:21:09 UTC
Permalink
Here is a simple example of the issue:

I made a test file called test_dat.txt that has a bunch of lines of
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3


IDL> file = 'test_dat.txt'
IDL> openr, lun, file, /get_lun
IDL> a = strarr(6)
IDL> readf, lun, a
IDL> print, a
a, b, 0, 1, 2, 3 a, b, 0, 1, 2, 3 a, b, 0, 1, 2, 3 a, b, 0, 1, 2, 3 a,
b, 0, 1, 2, 3 a, b, 0, 1, 2, 3


What I would hope to get would be:
a, b, 0, 1, 2, 3


Issue make sense?


Brian

--------------------------------------------------------------------------
Brian Larsen
Boston University
Center for Space Physics
Tom
2007-12-20 21:32:29 UTC
Permalink
Post by Brian Larsen
I made a test file called test_dat.txt that has a bunch of lines of
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
IDL> file = 'test_dat.txt'
IDL> openr, lun, file, /get_lun
IDL> a = strarr(6)
IDL> readf, lun, a
IDL> print, a
a, b, 0, 1, 2, 3 a, b, 0, 1, 2, 3 a, b, 0, 1, 2, 3 a, b, 0, 1, 2, 3 a,
b, 0, 1, 2, 3 a, b, 0, 1, 2, 3
a, b, 0, 1, 2, 3
Brian,
this looks correct-
print, a[0] ;for the first line

Tom

David Fanning
2007-12-20 20:22:57 UTC
Permalink
Post by Brian Larsen
No the fields are the same line to line, readf is not breaking up the
line.
When you do
IDL> dat = strarr(134,3)
IDL> readf, lun, in, dat
you dont get one number per array element, you get one line per
element.
Well, why not use a FORMAT statement, then?

Cheers,

David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")
Brian Larsen
2007-12-20 20:25:26 UTC
Permalink
Post by David Fanning
Well, why not use a FORMAT statement, then?
Exactly, how do you include commas in the format statement? They seem
to just be separators, or I am really slow today... I hope its not
really slow, or maybe that ok as xmas is close :)






Brian

--------------------------------------------------------------------------
Brian Larsen
Boston University
Center for Space Physics
David Fanning
2007-12-20 20:30:03 UTC
Permalink
Post by Brian Larsen
Exactly, how do you include commas in the format statement? They seem
to just be separators, or I am really slow today... I hope its not
really slow, or maybe that ok as xmas is close :)
I usually use an "x" to represent commas. ;-)

Unless, of course, you want to read them as something.
But I usually just ignore them.

Cheers,

David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")
Brian Larsen
2007-12-20 20:35:58 UTC
Permalink
Post by David Fanning
I usually use an "x" to represent commas. ;-)
Unless, of course, you want to read them as something.
But I usually just ignore them.
OK, but none of my format attempts are working... for my simple
example above it seems that format='(2A0,4I0)' should work, but it
doesn't.

IDL> dat = {a:'', b:'', c:0, d:0, e:0, f:0}
IDL> openr, lun, file, /get_lun
IDL> readf, lun, dat, format='(2A0,4I0)'
% End of input record encountered on file unit: 101.
% Execution halted at: $MAIN$





Brian

--------------------------------------------------------------------------
Brian Larsen
Boston University
Center for Space Physics
Brian Larsen
2007-12-20 21:08:57 UTC
Permalink
OK, I got it but I'm not sure it makes sense.

for the simple example above the format code:
IDL> readf, lun, dat, format='(2(A,x), 4(I,x))'
% End of input record encountered on file unit: 101.
% Execution halted at: $MAIN$
does not work, while the code:
IDL> readf, lun, dat, format='(2(A2,x), 4(I,x))'
does work
IDL> help, dat, /str
** Structure <1eaa8c4>, 6 tags, length=32, data length=32, refs=1:
A STRING 'a,'
B STRING 'b,'
C INT 0
D INT 1
E INT 2
F INT 3

So if you don't specify the length of the first string it takes
everything?!?!




Brian

--------------------------------------------------------------------------
Brian Larsen
Boston University
Center for Space Physics
Continue reading on narkive:
Loading...