
Quote of the day


I should declare that I am an extensive user of freedom of information legislation, particularly as regards universities, which I have found unutterably tiresome and difficult to deal with. One of their more tiresome habits is to refuse to provide information in anything other than PDF format. They get it in Excel, or whatever form, and translate it into PDF to provide it to me, merely to cause me extra work. I have to buy a program to suck it out of the PDF again. PDF is not a transmissible format, as it were, and they are merely trying to make life difficult by putting it in that format. So I would like to be sure that when data are provided they are provided in a properly reusable format. I have never come across a data set that cannot be reduced to tabbed, delimited text. Maybe that happens in a collection of tables, but data are essentially a simple thing. Although the data may be held in an immensely complex form in the program that the scientists are using, in any program that I have come across it should be easy-if only for the purposes of sharing with other people-to drop out at least the base data into relatively simple form.
Lord Lucas, speaking in the debate on the Protection of Freedoms Bill, has clearly experienced some of the same frustrations as others who have tried to get information from universities.
Reader Comments (14)
Climate-XML
Simple. Problem fixed
I wonder how many of his fellow lords understood any of that?
The real problem is scanned information whether in pdf or not. If one wishes to quote from the document it is hard work and if long underlined URLs with underscores are used it is a real pain.
In fairness to the Met Office, when I complained and asked for a character based copy they apologised, remedied it quickly and said they understood the issue. So ask for an electronically readable copy any time people do it.
I must get my eyes tested again. For a moment I thought it was Lord Lucan.
Lord Lucas studied Physics at Oxford University. And he is a Chartered Accountant.
And he is a hereditary peer, not answerable to those who ennobled him.
http://en.wikipedia.org/wiki/Ralph_Palmer,_12th_Baron_Lucas
Would that there were more independently-minded, intelligent and thoughtful peers like him, active in the business of the House of Lords.
Could it be that one reason for wrapping up the data as a pdf, is to make it relatively tamperproof.
However, there's no reason why the data shouldn't be provided both as an spreadsheet file and as a pdf.
Cassio- and that is precisely why the last Government, in particular, reduced the hereditary peerage representation as much as possible.
They can't be trusted to toe the Party line.
I sympathize with data distribution by pdf. I once wasted several days debugging a problem in data conveyed by Excel file, only to discover that the problem was introduced by the receiver and that the file I had sent him had been used directly, then modified so as to introduce (innocently) the problem. He had modified the file I sent him, had not kept the original and much time was wasted discovering the problem - and all with no ill intent.
Likely the best course is to send a passworded pdf as the archive version, and ALSO whatever it is in whatever the most usable native format might be, CSV for example.
I've been lurking here for a while but this is the first time this software geek has felt competent enough to comment on something.
As others have said, it's pretty straightforward - for tabular data, CSV is ideal. For datasets where you wish to convey relationships, XML is king, or if verbosity is a problem then JSON is possibly acceptable.
There are parsers for any of those three for virtually any app you may wish to use within those contexts.
Sorry, I forgot to add - the added benefit of using text rather than binary formats such as PDF is that they tend to compress much better, so generally you'll get more bang for your byte from your zip files.
@Cassio:
"Would that there were more independently-minded, intelligent and thoughtful peers like him, active in the business of the House of Lords."
I'm afraid he's now top of the list for the cull in the next constitutional shake-up.
tabbed, delimited text? CSV is so much easier to handle
diogenes - CSV = Comma Separated Variable = delimited fields with commas no? I think Excel likes it just the same as tabs.
General solution : slash government funding by £10,000 each time an FOI is dealt with obstructively.