My friend Roger informs me that this is a single library call in PHP. I once did this by using jtidy so I could get the html tags using an XML DOM, but that was very silly.

Been tearing my hair out – I want to be able to take the output from a URL and put it into a portlet. I therefore want to rewrite where the <form> tag sends back from one server so that I can put the portlet on the front of it.

1.4 JDK has a lovely regular expression thing so I don’t have to import the GNU RegExp libraries.

Great says I, only problem is that it uses the new CharSequence interface.

All of the examples are about reading files, not the contents of a URL, which I only seem to be able to get as an InputStream, which isn’t the same as a FileInputStream, and there’re all kinds of hoops you have to jump through that InputStream doesn’t have methods to deal with.

OK, have a look at the example in Java Examples in a Nutshell. It uses a byte[] buffer to read from the URL’s inputstream. Maybe I can put this into a StringBuffer and then use it because it implements the interface.

StringBuffer, quite logically, doesn’t have a way of appending arbitrary numbers of bytes from an array, only chars.

Then, dear reader, I realised that an InputStreamReader class will read chars. Finally, after about 2 hours googling my bonce off:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
public class RewriteURL 
{
  /**
   * 
   * @param in : the char sequence to replace
   * @param searchFor : what to search for
   * @param replaceWith : what to replace it with
   * 
   * Adapted from http://java.sun.com/developer/JDCTechTips/2002/tt0604.html#tip2
   */
  private static String replace
    ( CharSequence in
    , String searchFor
    , String replaceWith
    )
  {
    StringBuffer out = new StringBuffer() ;
    // Create a pattern to match search
    Pattern p = Pattern.compile(searchFor);
    // Create a matcher with an input string
    Matcher m = p.matcher(in);
    boolean result = m.find();
    // Loop through and create a new String 
    // with the replacements
    while(result) {
        m.appendReplacement(out, replaceWith);
        result = m.find();
    }
    // Add the last segment of input to 
    // the new String
    m.appendTail(out);
    
    return out.toString();
  }

  /**
   * 
   * @param args
   */
  public static void main(String[] args) throws Exception
  {
    
    URL url = new URL("http://some-server/someurl") ;
    InputStreamReader urlContents = new InputStreamReader(url.openStream());
    
    StringBuffer sb = new StringBuffer() ;
    
    char [] buffer = new char[4096] ;
    
    int bytes_read;
    while( ( bytes_read = urlContents.read(buffer)) != -1 )
    {
      sb.append(buffer,0,bytes_read) ;
    }
    
    System.out.println(replace( sb, "mvtest2.jsp", "someOtherForm.jsp" ));
}

I’m not sure how well this would work with a large file, but for trivial stuff it’s OK. There are some useful things on the GNU Regexp site as well