Month: February 2006

US Social Security Numbers – a basis for fraud?

In reply to this article:

I think a lot of UK/European readers won’t get why the social security number thing is such a big deal. If memory serves some genius back in the early days of US IT decided that, rather than give everybody their own customer number, they’d just use the guaranteed unique SS number. This soon became common practice.

So, it’s not that McNealy’s SS number is compromised particularly, more that a knowedgeable hacker can use this number when they break into other systems to find out things about him and also pretend to be him and commit fraud.

In the UK I don’t think most of us would give a toss if someone knew our NI number because it isn’t plastered all over our credit card vendor’s internal systems. I do wonder if this will change if the UK government manage to get their crackpot id card scheme off the ground, will this number then start mattering because it will be plastered everywhere like it is in the US? Then the hackers will find committing fraud (sorry, “identity theft”) much easier. I bet no-one’s thought about it at all.

Regards,

Francis Fish

Fun with XPath: Combining many fields in a query.

This is one of those things that are so obvious I felt like kicking myself when I realised how to do it.

First, some XML:

<?xml version="1.0" standalone="yes"?> 
<styling_rules >
<rule column="NAME" >
<features style="MAG:C.WASSNAME.CHOCOLATE"> NAME=‘T BAR BOO’ </features>
<label column="NAME" style="MAG:T.STREET NAME">1</label>
</rule>
<rule column="NAME" >
<features style="MAG:C.WASSNAME.CORAL"> NAME=‘UP A CREEK’ </features>
<label column="NAME" style="MAG:T.STREET NAME">1</label>
</rule>
<rule column="NAME" >
<features style="MAG:C.WASSNAME.CORNFLOWERBLUE"> NAME=‘UNHAPPY VALLEY’ </features>
<label column="NAME" style="MAG:T.STREET NAME">1</label>
</rule>
</styling_rules>

This has been anonomised for the purposes of this discussion. The tool that uses the data flattens the two elements inside the rule element out so fine. I needed an XPath that would get me a particular rule node so I can manipulate it.

I put my SQL head on and looked for two sets that I could combine:

xPath = "/styling_rules/rule[column='NAME']/features[style=‘MAG:C.WASSNAME.CORNFLOWERBLUE’ ” + 
” and /styling_rules/rule[column='NAME']/features/[text() = &quot; NAME='UNHAPPY VALLEY' &quot; &quot; ;</pre><br />Nah, don't even go there, it doesn't choke but it doesn't do anything. Instead think directory path and the *nix <em>find </em>command comined together:<br /><pre>xPath = &quot;/styling_rules/rule[column=‘NAME’]/features[@style=‘MAG:C.WASSNAME.CORNFLOWERBLUE’ ” +
” and ./text() = " NAME=‘UNHAPPY VALLEY’ " ]/parent::” ;

This should get me the whole node and I can then update the elements in it to my heart’s content.

Java: finding nodes with xPath and how to Dump out XML DOM as a string

After some google hacking (and reading Building Oracle XML Applications, which I thought was way out of date, my first edition of Java and XML was useless) I found this kind of stuff:

XPath

To use XPath you need an XMLDocument object:

  DOMParser p = new DOMParser();
p.parse(new StringReader(xml));
  Document doc = p.getDocument();
  XMLDocument xmldoc = (XMLDocument)doc ;
  NodeList nl = xmldoc.selectNodes(xPath) ;

Now we have a loverly list of nodes, and off we go.  Note that this suffers from the usual Java nonsense of returning a null if there’s nothing there.

Getting a String version of a DOM Document

toString()

Hmmm – just gives the object ID. How useful is that? Not very.

I noticed that the XMLDocument class has a print method that you can pass an OutputStream to. OK, I think, pass it a StringWriter?

        StringWriter sw = new StringWriter() ;
        xmldoc.print(sw);
        System.out.println(sw.toString());

This compilies but throws a Null Pointer Exception. Must be the interfaces matching but not doing what they’re supposed to. I dig around in the class documentation and find that I can pass a PrintWriter – this works:

    StringWriter sw = new StringWriter() ;
PrintWriter pw = new PrintWriter(sw );
xmldoc.print(pw);
System.out.println(sw.toString());

No idea why, haven’t got time to mess around finding out. It works. I think this is superior to a solution I saw where people were calling the serialize method on some weird class or writing your own code that dumps out the contents of a node and passing the root node to it (there are lots of examples on the Web if you look around). Note that this doesn’t work with fragments, at least it doesn’t seem to, have a play with it.

Coming up Fun with XPath: Combining many fields in a query. (when you aren’t trying to write some XSL).

Forwarding mail from a webmail account using python

NOTE: The code here is provided for discussion purposes, if you choose to use it yourself on your own head be it! I used Python 2.4 with emacs and the python.el mode file.

POP mail from my Macmail account hasn’t worked since December. I’ve emailed the support people a few times to no avail. It’s a free service and I don’t think anyone’s home any more. I decided to try and write a program that would pretend to be a browser and just forward everything on to my gmail account.

This was surprisingly easy.

I decided to use Python, because I know it a bit. CLisp was my second choice but there don’t seem to be the wealth of examples on the ‘net. I’ve done stuff like this in Java, but you have to do crazy things like run it through jtidy first and treat the html as xml, which is a complete pain. The Python SGMLlib just works.

I read the HTML processing chapter of dive into python, which gave me a grounding for the what I needed to do with SGML processors and stuff.

But first, I needed to learn how to log onto the mail service using cookies. I found this on the ClientCookie module, and that did the trick.

I then made a big messy file that I could run in emacs and keep stuffing prototype code into the Python interpreter. After some work I came up with this for the logon:

macMailURL = "http://mail.macmail.com"

def login2Macmail():
    request = ClientCookie.Request(macMailURL)
    # note we’re using the urlopen from ClientCookie, not urllib2
    response = ClientCookie.urlopen(request)
    firstPage = response.read();
    #print firstPage
    # Now we need the sessionID
    data = "login=aname&name=aname&pwd=apass&password=apass"
    # let’s say this next request requires a cookie that was set in response
    request2 = ClientCookie.Request(macMailURL + "/logon.php?logoff=1")
    response2 = ClientCookie.urlopen(request2,data)
    return response2.read()

This returns the inbox html from the second response. I’ve left the commented debug statement in.

So, now we need to take this page, rip out the <a> tags that point to emails, and use this info to forward the mails:

class collectTags(SGMLParser):<br /><br />&nbsp; def reset(self):<br />&nbsp;&nbsp;&nbsp; SGMLParser.reset(self)<br />&nbsp;&nbsp;&nbsp; self.urls = []<br /><br />&nbsp; def start_a(self, attrs):<br />&nbsp;&nbsp;&nbsp; href = [v for k, v in attrs if k=='href']<br />&nbsp;&nbsp;&nbsp; if href:<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #print href<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if href[0].find(&quot;/member/mail.php&quot;) != - 1:<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; self.urls.extend(href)

Extending SGMLParser – if it finds an <a> tag it will run the start_a method I and do what I want.

This little class will stick all of these URLs (which have an href of the form /member/mail.php&id=1234) into the urls list. Note the v for k with the if statement, this lovely one liner is why I like Python so much. The only problem is that it returns a list which will only have one element. I’m sure there’s a way of changing the one-liner but I don’t know what it is yet. Still, think of the equivalent Java, or PL/SQL! Having to reference the first element of the returned list is a small price to pay for this expressive power. Of course, some Python god will say I’m talking out of my rear end and I just need to do …, whatever that is.

Next I need to forward this mail. This is quite complex because I need to get the page displaying the forward, parse out the text of the mail and any other arguments, and then submit this as a post command back to the web server afdter substituting my forward mail into the string.

class getMailBody(SGMLParser):

def reset(self):
SGMLParser.reset(self)
self.data = ""
self.inForm = 0
self.inTextArea = 0
self.textAreaName = ""
self.textAreaID = ""
self.textAreaText = []

def start_form(self, attrs):
theForm = [v for k, v in attrs if k==‘action’]
if theForm:
self.inForm = theForm0.find("/member/send_mail.php") != – 1

def end_form(self):
self.inForm = 0

def getValue( self, val, attrs ):
return [v for k, v in attrs if k==val]

def appendData(self,value):
amp = ""
if self.data:
amp = "&"
self.data = amp + value

def processAttribs(self,attrs):
if self.inForm:
name = self.getValue(‘name’,attrs)
idVal = self.getValue(‘id’,attrs)
value = self.getValue(‘value’,attrs)
if not value:
value.append( "" )
if name:
# print "name" + name0
self.appendData( urlencode( {name0:value0 } ))
if idVal and idVal != name:
# print "idval" + idVal0
self.appendData( urlencode( {idVal0:value0 } ) )

def start_input(self, attrs):
self.processAttribs( attrs )

def start_textarea(self, attrs):
if self.inForm:
self.textAreaName = self.getValue(‘name’,attrs)
self.textAreaID = self.getValue(‘id’,attrs)
self.inTextArea = 1

def end_textarea(self):
if self.inTextArea:
self.inTextArea = 0
if self.textAreaName:
# print "text area name" + self.textAreaName0
self.appendData( urlencode( {self.textAreaName0:" ".join(self.textAreaText) } ))
if self.textAreaID and self.textAreaID != self.textAreaName:
# print "text area idval" + self.textAreaID 0
self.appendData( urlencode( {self.textAreaID0:" ".join(self.textAreaText ) } ))

def handle_data(self,text):
if self.inTextArea:
self.textAreaText.append( text)
#print text

This class will parse the forward mail page out into the data member so that I can then use it to send an http post request to the remote server, thus:

def forwardMail(url):

replyTag = "/member/reply.php"

# of the form /member/mail.php?id=3298
splitURL = url.split("?")

data = "%s&btn=Forward" % splitURL1
request = ClientCookie.Request(macMailURL + replyTag)
response = ClientCookie.urlopen(request,data)
page = response.read()
#print page
mb = getMailBody()
mb.feed(page)
mb.close()
forwardTag = "/member/send_mail.php"
data = mb.data.replace("to=","[email protected]")
request = ClientCookie.Request(macMailURL + forwardTag)
response = ClientCookie.urlopen(request,data)
page = response.read()

Of course, replacing fred with your mail. I’ll leave working this out to the reader.

Macmail does delete though the move command. I reused the urllist from the last page:

def deleteMail(urlList):
deleteTag="/member/move.php"
amp = ""
deleteData = []
for val in urlList:
bits = val.split("=")
#print bits
theID = bits1
deleteData.append(theID)
data = "delete[]=" + "&delete[]=".join(deleteData)
#print data
request = ClientCookie.Request(macMailURL + deleteTag)
response = ClientCookie.urlopen(request,data)
return response.read()

This will return the next page after all of the mail displayed on the current one is deleted.  Here we glue it all together:

###################################
# Processing body
###################################

page = login2Macmail()

while True:
parser = collectTags()
parser.feed(page)
parser.close()
if not parser.urls:
break
for url in parser.urls:
forwardMail(url)

# Submit delete request for all URL’s from first page
# get page again

page = deleteMail(parser.urls)

#print page

## Now delete all sent mail

request = ClientCookie.Request(macMailURL + "/member/index.php?folder=SentItems")
response = ClientCookie.urlopen(request)
page = response.read()

while True:
parser = collectTags()
parser.feed(page)
parser.close()
if not parser.urls:
break

page = deleteMail(parser.urls)

#print page

request = ClientCookie.Request(macMailURL + "/member/empty_trash.php")
response = ClientCookie.urlopen(request)
page = response.read()

This little control block loops through all of the inbox pages until there are no more mail viewing url’s left, as it goes it deletes them all. Then it goes to the outbox and deletes all of that, then it calls the empty trash function to be polite. I don’t want to leave a load of junk on that server, the mail account has been very useful over the years and I wouldn’t like to annoy the people running it.

For the record, the import statement at the top is this:

import ClientCookie, urllib2
from sgmllib import SGMLParser
import htmlentitydefs
import sys
from urllib import urlencode

Have fun, and don’t eat too much Java, it’s bad for you and takes too long, life is short, use proper powerful tools.