Month: August 2006

Java as a text processing language

I am an old Unix hacker and know how to use tools like Awk/Sed and whatnot to do text processing. I have managed to carry on even in the Windoze days by installing Cygwin.

More recently I’ve been using Python because I wanted to get an understanding of the language. It has a lot of the same functionality and is more modern and has much cleaner syntax. I like the short hand ways of filtering arrays as well, plus lambda functions, the list goes on. Problem is: no-one out there in the real world is gainfully employed using it. Lots of open source stuff but I need to eat.

I’ve been looking for Java work recently and decided that I would start using Java for the pre-processing work because I want to keep my hand in with the language. I had decided against using it a long time ago because you had to do all sorts of nonsense loading 3rd party stuff in the class path if you wanted to split strings and use regular expressions. I steeled myself for a lot of pain after I made this decision and found I was wrong.

Java 2 has most of the things you need built into the String class these days and the process was remarkably painless.

The problem

I was trying to compress a file full of run time data down to look for long-running functions. It looked like this:

23/08/2006 16:07:47|58067|Entering|FUNC_1
23/08/2006 16:07:47|58068|Leaving|FUNC_1
23/08/2006 16:07:47|58067|Entering|FUNC_2
23/08/2006 16:07:47|58067|Entering|FUNC_1
23/08/2006 16:07:47|58067|Leaving|FUNC_1
23/08/2006 16:07:47|58067|Leaving|FUNC_2

This is the timestamp, the number of seconds in the day, the operation, and the function name. I have milliions of lines of this and want to remove the lines next to each other that have a 0 difference in the seconds. This is simply to give me somehwere to start in my hunt for long-running functions. I know that any real programming environment would have a profiler but we’re talking PL/SQL here so you have to roll your own.

I wrote a program in Awk that pulled the whole thing into memory and then looked at the fields in the records but it wouldn’t work. I realised later that I was calling split with the argument |, I think that this split the whole like up into single characters.

Lost patience with Awk and thought OK, try again in Java.

The Solution

I needed to be able to read a line and break it into its delimited components.

Had to do the usual Java nonsense to get the file open:

        BufferedReader r = new BufferedReader( new FileReader( args0 )) ;

Then get the current line with

                currLine = r.readLine() ;
                if ( currLine == null ) break ;

Ho hum. Next split the strings. First I tried:

            currBits = currLine.split(“|”);

(currBits is the pieces of the string). This wouldn’t work because | means split on everything. I had to use JDB (which is not that bad to use, actually) and worked out that the Bits elements were all one character each. Then we changed to

            currBits = currLine.split(“\|”);

This escaped the special character and it worked. The rest of the code was schoolperson stuff using the array of string pieces for comparison with some read ahead to look at the next record and work out if you want to print or not.

In essence I can do the simple text processing and line splitting in Java now. The regular expressions are also now part of the main library (Patterns) – it’s really moved on from Java 1.1; which was too painful so I stuck to Awk. Theres also a replace function on strings as well.

Only problem is, string are immutable and will get thrown away into the garbage collection void if you do a lot of string manipulation. I wonder if Sun have added the functionality to StringBuffer? I bet not.

Next!

OK – simple Java editor that lets me do debugging and so on WITHOUT having to create a project and spend half a day messing with it. Emacs is OK but I’d have liked code completion and built in debugging. I know you can get emacs to do this but it then takes a year to start up as it loads everything and I’m not a patient guy with my tools – start up quickly and start giving me what I need NOW. This is probably why I don’t like M$ windows, even though I use it every day.

The Trap

I sit here late at night and watch old movies

I want to feel the life

Just being and doing what I want when I want

Watching those films

Writing crazy stuff late at night

Not getting up until I’m ready

But I am trapped by the life we are expected to lead

Its norms and funny faces

I’ve been a fool for so long

Not looking out for the real things

Playing with what little I had

Without realising that, with a little more effort, I could be free.

I could have been free a long time ago.

Precision Arithmetic

http://www.regdeveloper.co.uk/2006/08/14/rounding_issues/

I worked briefly at a bank and one of the guys was trying to write a little JavaScript app that allowed punters to see how much interest they would pay over a given period without having to go back to the server all the time. Really simple bit of dividing over a 3 year period.

JavaScript uses doubles for numbers and the jolly old ISO-standard for how they are used in calculations.

9/3 was coming out at 2.9999999999 (9999999 …)

and there was no way to fix it.

I still don’t know how he made it look right.

I had a hack in Java for him and floats were coming out at 3, doubles had the same problem. Java and JavaScript both use the ISO standard so no surprise.

It still amuses me that more precision meant a less accurate result, at least to a human.

Victims and not

I have a no-longer friend who is a constant victim. Their helplessness makes me angry. I know it shouldn’t, but I’ve been on the receiving end of their agression for many years and was never able to sort it out properly until now.

I was chatting about this to one of my other friends, saying that I feel helpless and can’t help them. In fact, my attempts at interaction make me lose my temper and make things worse. I know, adults should be able to keep control of their tempers, but I didn’t manage to on this occasion.

He said a very telling thing: if you feel like a victim then you are a victim. This was a very penetrating thought. It isn’t anything to do with your environment, your victimhood penetrates everything you think and are. If you think everyone hates you then they will, and your mind will create the colours of their hatred.

This reminded me of what the one of my teachers talked about when I studied English Literature a long long time ago: the pathetic fallacy. In essence in Romantic literature as the hero dies the sun sets. It’s more obvious in paintings but is there in written literature as well.

The person I am talking about lives in a world totally dominated by this fallacy. I used to think like this a bit, I would see motivations that weren’t there in other people’s behaviour – in fact most people do not care what you say or do as long as you let them be. The inconsiderate behaviour of others isn’t a conspiracy; they’re just inconsiderate and selfish – neither of which is a crime or particularly premeditated. If you talk gently to most people and say you aren’t happy about something they will listen, and probably be a little surprised, usually their reasons are very good and you realise you were being a little silly. If you start from a position that they are out to get you and act in a dumb, agressive, hostile fashion they will not listen because your behavior cues their defenses and they will be out to get you.

But how do you tell someone this when you are one of their enemies, if not the most hateful of all?

I can’t, and of course the feelings and the pain are real. I can’t do anything about it, either, because I just lose it and add to the burden they think they are carrying.

I suppose I have learned something valuable, even if it does make me feel sad.

Date tracked views can bite you

Our product contains views of networks and inventory that are date-tracked so that you can wind the clock backward and see what the network looked like at a particular time.

Some of the lookup metadata used for validation is also date tracked because we re-used some existing tables.

I had a strange experience:

Run your test script, error saying something isn’t there. Check by typing in the SQL – it isn’t – OK. Log back in again, it is…

The test script was setting the effective date so that, when it end-dated some existing data the end-date was the one is should have been, not the system date. When updates on other data were being done the lookups weren’t there because they fell outside this range.

Deep joy. The only way you can tie temporal data together that I know of on modern databases is to use triggers and application logic – I’d be very interested in a generic way of doing this that allowed you to express range dependencies properly. On our current system this often only comes up on update, usually trying to end-date a parent with un end-dated children.

I think that there is a book on this but I can’t afford it right now and the people I work for wouldn’t buy it for us because 50p on a book that could save us thousands just isn’t the way they think…

At least I’ll know where to look first next time.