Sunday, June 25, 2006

Newspaper Design Algorithm

As I was mentioning in an earlier blog entry, the part of the FeedJournal project which I have been feeling most insecure about is how to design the algorithm for laying out the articles in the newspaper. This is a critical step for a number of reasons: it has to look like a newspaper, it has to read like a newspaper, and it has to be pretty (the output PDF is essentially a part of the FeedJournal GUI).

Anyway, I am happy to report that significant progress have been in this area. I developed an algorithm which dynamically creates a newspaper with customizable:

  • number of paragraphs
  • margins
  • paper size
  • font
  • spacing between rows and various article elements

I have also implemented support for a headline font size which is a function of the article's importance/size.

Together with the masthead (newspaper lingo for the first page logotype) the whole creation starts to look pretty snazzy, if I get to say it myself.

The algorithm implementing the layout is pretty simple, but efficient. First, I gather the collection of articles which are due to be published in the upcoming issue. I sort these according to size/importance; this step is made very simple by the new Generics classes in .NET 2.0. Then I take the first article which fits into the next available space on the current page and remove the article from the collection and mark the page space as occupied. If no article fits the remaining space I will publish what fits on the page and add a page jump to another page where the article is continued. Basically that's all there is too it, and this algorithm works very well so far.

In one of the coming blog posts I will attach a sample PDF file to showcase FeedJournal.

Monday, June 19, 2006

SQL Server Strangeness

So I got up at 4:30 today in order to get some serious work done on FeedJournal before going to work or the baby wakes up. Well that was my plan at least. The first part went fine; I got out of bed, went for a short run and showered. By then the baby was awake though so I had to do some multitasking with one arm holding the baby, while the other hacking away at the keyboard. But that's actually much nicer than it sounds. Seriously.

Pretty soon I run into problems with my SQL database that has been working flawlessly until now. Whenever I tried to connect to it I was thrown an SqlException: "Failed to generate a user instance of SQL Server due to a failure in starting the process for the user instance. The connection will be closed." At first I thought it was due to an incorrect connection string but everything seemed fine and the database was in the right location. So I hit the waves of the WWW to try to find anyone with similar problems out there. Loads of people had run into the same problem, but in most cases they were using Remote Desktop, which was the root of their problems. I'm not using that, so I was left to my own devices again.

I tried to restart the development environment. I tried to manually restart the SQL Server services, but to no avail. It wasn't until I did a reboot that the problem went away, and now everything is working fine again. Crossing my fingers.

Sunday, June 18, 2006

NP-Complete

I have been sick with the flu for the last week and still don't feel so great. For this reason the programming hasn't really proceeded as I expected. However, I have been doing a lot of thinking in my head about the database and class designs. As soon as I feel better I will work on laying out the PDF newspaper dynamically, which I realize will be a tough nut to crack.

Basically the problem is related to the classic computer science problem of bin packing, which is NP-complete. NP-complete is a computer science term, standing for "non-deterministic polynomial time". It basically means that there is no simple solution to the problem. My approach will be to take some shortcuts and make compromises so that the layout will be acceptable from a design viewpoint, while not digging myself into a hole with a too complex layout algorithm

In the meantime, while waiting for the fever to go away, I am reading some academic papers on newspaper layout and bin packing solutions. I don't think it will help my sickness, but it does make me sleepy.

Monday, June 5, 2006

The Feed Format Jungle

I have started the implementation of my project in C# Express Edition, and one of the first things I have stumbled upon is the frustration of having to deal with many different XML feed standards. There are RSS and Atom, each of them with several different sub-versions. But that's not all. We also have a slew of Internet cowboy hackers who don't have any desire at all to follow these standards. In short, RSS/Atom land is a jungle. Time to take out the machete! When researching the options of a suitable machete for the feed jungle, the following 3 caught my attention:

  1. Atom.NET + RSS.NET
  2. IP*Works
  3. Microsoft's RSS library, included in IE7
  4. Rolling my own component based on .NET's XML support

Atom.NET + RSS.NET

These are two separate open-source libraries, implemented in C# .NET, which enables users to work with the two feed standards and all of their sub-standards through a .NET programming interface. Unfortunately the two components expose two interfaces without much similarity. In addition to this the program is not in active development any longer Instead the author is creating a commercial closed source version of the components.

IP*Works

When registering the copy of Visual Studio 2005 Express Edition, one of the freebies that Microsoft offer you is a license of IP*Works' RSS component. The word free was misleading me for a while, until I realized that I was being offered a free developer license only, without any rights to distribute the component with the applications you are building in Express.

Microsoft's RSS library, included in IE7

With the upcoming Internet Explorer 7 (included in Beta2), Microsoft has really outdone themselves with the RSS/Atom support. Included in the browser will be a feed repository that any application can use to know which feeds are of interest to a user. Also articles and their read/unread state will be stored here. However, IE7 requires Windows XP or above, cutting off a large piece of the current end-user segment.

Rolling my own component based on .NET's XML support

Of course, being a developer, you are always attracted by the possibility of rolling everything yourself. However, considering the abundance of RSS/Atom formats out there, this would be suicide if I attempted this during the short time available to build FeedJournal within the contest.

Conclusion

After some prototyping with the different options I decided to go with the open-source Atom.NET and RSS.NET components. However, I quickly noticed some bugs and limitations, that I fixed in the components (the wonder of open-source!). I am wrapping Atom.NET and RSS.NET in my own classes "Article" and "Feed" which have different constructors for the different feed types.