Sunday, July 30, 2006

Writing Documentation

Writing software documentation is probably the most boring part of a project for a developer. However, having blogged during the development process makes it easier. I am able to take some blog entries and paste them into the help files with very little editing. Also, by continuously writing blog articles during the past months, it is easier to fight writer's block.

I am using the free Shalom Help Maker to generate a standard CHM-file, easily accessible from within FeedJournal. Writing the help files is instructive because it places you in the user's shoes, and any design flaw becomes much more apparent. However, I have been working hard to keep the application design simple, and I hope that it is intuitive enough for users, so that they will not need to resort to the help system.

Saturday, July 29, 2006

Deep Linking from RSS

One of the more unique and perhaps controversial features of FeedJournal is that it can filter out the meat of an article published on the web.

How does it accomplish this? FeedJournal has four ways of retrieving the actual content for the next issue.

Actual Content
In the trivial case, a site (like this blog for example) decides to include the full article text within its RSS feed. FeedJournal simply published the content; no surprises here. By the way, this is how all standard RSS aggregators work. The problem is when a site decides to only publish summaries or teasers of the full article text. FeedJournal needs to deal with this because it is an offline RSS reader, users cannot click on their printed newspaper to read the full article.

Linked Content
The <link> tag inside the RSS feed specifies the URL for the full article. In case the RSS only includes summaries of the full articles, FeedJournal retrieves the text from this URL.

Rewritten Link
In most cases, just following this link is not a good solution. The web page typically includes lots of irrelevant content, like a navigation menu, a blogroll, or other articles. FeedJournal lets the user write a regular expression for each feed, automatically rewriting the article’s URL to the URL of the printer-friendly version. As an example the URL to a full article in International Herald Tribune is http://www.iht.com/articles/2006/07/28/news/mideast.php while the link to the printer-friendly version is http://www.iht.com/bin/print_ipub.php?file=/articles/2006/07/28/news/mideast.php By inserting bin/print_ipub.php?file=/ in the middle of the URL we will reach the printer-friendly article. This article is much more suitable for publishing in FeedJournal, because it more or less only contains the meat of the article.

Filtered Content
“More or less”, I said in the last sentence. There are usually some unwanted elements left in the printer-friendly version, like a header and a footer. These can be filtered out by letting FeedJournal begin the article after a specified substring in the HTML document source. Likewise, another substring can be selected as ending the relevant content.

By applying these functions it is possible to scoop, or extract, the meat of almost any web published article. Of course it is only necessary to do this once for every feed. To my knowledge, FeedJournal is the only aggregator who has the functionality described in the last three sections.

Is this legal, you ask? Wouldn’t a site owner require each user to actually visit the web site to read the content and click on all those fancy ads sprinkled all over? Well, my stance is that if the content is freely available on the web, I am free to do whatever I want with it for my own purposes. Keep in mind that we are not actually republishing the site’s content, we are only filtering it for our own use. Essentially, I think of this as a pop-up or ad blocker running in your browser.

What is interesting to note is that some web sites have tried to include in their copyright notice a paragraph limiting the usage of their content. Digg.com, for example, initially had a clause in the their copyright effectively prohibiting RSS aggregators from using their RSS feeds! Today, it is removed.

As long as FeedJournal is used for personal use, and the issues are not sold or made available publicly, I do not see any legal problems with the deep linking.

Friday, July 28, 2006

Time for Code Freeze

Time, quality, resources and scope. Those are the four variables in software project management. As the deadline closes in I only have the luxury to change scope. Sure, there are more features I planned to get into this version, but the scope will be cut in order to make the release stable and have a timely delivery. Time is a rare resource for me these days with being a new father , having a full-time job, following the latest news about the regional conflict, and blogging/developing FeedJournal . Despite that, I am proud of what I have accomplished so far with my project in Visual C# 2005 Express Edition.

One week remains until release, and the time has come for Code Freeze: no more new features. Until August 6th I will work on finalizing documentation, web site, and of course testing.

FeedJournal will become a commercial project in version 2.0. Until then the fully functional version 1.0 will be the one submitted for the Made In Express Contest under a shared source license. That basically means that the source is available but there are no rights to use this source code in your own projects. I plan to add plenty of cool features before a commercial release plus some optimizations under the hood. I would like to thank everyone who contacted me with feature requests or comments. What you should expect to see in future versions are:

  • Sections
  • Layout templates
  • Images
  • Browser integration for publishing a selected web page in the next issue
  • PDF import (nice for those online crossword puzzles and Sudokus)
  • Advanced article scoring using user-defined keywords and extended RSS tags
  • Scheduled publishing and printing
  • HTTP authentication and cookie support
  • Improved throttling support and adhering to servers’ TTL setting

There are also some ideas I had envisioned early on in the project, prior to starting implementation, that have been moved to the recycle bin. Nothing strange with this, it is a normal reality check once you get down to the fine points of how things are supposed to work. For example, I had planned to rank articles according to web popularity (Technorati, Digg, delicious, Google, Yahoo, etc.). After much research of the various service APIs it is clear to me that this is simply not possible. Having x articles would mean sending x web requests to the various services. Until they support a technique for bundling together multiple requests into one, this feature will not be a part of FeedJournal.

In the meantime, I am looking for serious beta testers for future FeedJournal versions who will be rewarded for constructive feedback and testing.

Friday, July 21, 2006

FeedJournal Sample Issue


Yes folks, we have a world premiere, the first sample of a FeedJournal issue is available for your viewing pleasure! Let me remind you that the purpose of the FeedJournal project is to generate a PDF newspaper based on RSS feeds, intended for printing. The PDF file is available for download here. In order to open it you will need Adobe Reader or Foxit Reader.

The content spans a selection of last week’s blog entries from the Made In Express Contest finalists. I chose these feeds, not because I want to plug the contest, but because I want to avoid breeching copyright law for republishing other blogs’ articles.

So what can you see in this sample issue? The following settings are in use: A4 paper size (a European standard), 4 columns, 0 points line spacing, 8 points column spacing, 30 points page margin and 10 points margin between headline and article text. Furthermore, the headline is Times New Roman (22 pt bold), article text is Times New Roman (8pt), publishing date is Lucida Console (5pt) and news source is Arial (9pt italic). All of these settings, and others, can be customized from the application’s Options dialog.

But there are also things that you cannot see in the sample. Like for example images. Beside the masthead (newspaper lingo for the first page logo), there are no images. Future versions of FeedJournal will include support for images contained in blog entries. Another thing not visible in the sample issue is the already implemented support for long articles to jump between pages if they do not fit. The reason is simply that there weren’t any long articles available in the selected time span.

And let me finally take the opportunity to congratulate my fellow finalists for getting published in the newspaper! ;-)

I would appreciate any feedback regarding the sample issue, don’t hesitate to contact me by using blog comments or e-mail me at contact@feedjournal.com.

Monday, July 17, 2006

Rockets and Progress

In an instant the situation here has deteriorated. Rockets explode closer and closer to our home (so far a safe distance away), and it will surely take time before we will see and end to it all. Native Israelis are more relaxed about the situation, more adapted, or perhaps it's just an image they are putting up. Having recently become a father makes me worry about my family's security. Between closely monitoring the latest headlines, spending time with our 1.5 month old daughter Noa and working my butt off at my day-job, the contest deadline is slowly closing in.

I started to write the FeedJournal help file but I haven't decided on a format yet. HTML is attractive because I can easily host an online version of the help files, while keeping them up to date with minimal maintenance. CHM files are more standard and look more professional though. The jury is still out...

The application itself is starting to be finalized and I haven't forgotten my promise to submit a sample PDF newspaper issue here. Patience! For now here is a screenshot of the main window.

Wednesday, July 5, 2006

Why FeedJournal? (or why the information age matters)

The idea of an RSS syndicated newspaper came to me when I was subscribing to a morning newspaper last year. I hadn't had a morning paper for years then, so it was all a bit new to me. I really enjoyed to have access to news hot off the press, which I could read without having to stare into the computer monitor; for example in the comfort of my bed, sofa, or while traveling. But there were two things I strongly disliked about it: the monthly subscription was fairly expensive and I didn't really care for a majority of the content in the newspaper. The competing newspaper had a few sections that I would much rather read, but I couldn't afford to spend my time reading more than one morning newspaper. I knew that there were better ways out there for accessing relevant news in a comfortable way. I just needed to find them.

Content is king. There are no two ways about it. When people were talking about the information age ten-fifteen years ago I didn't get it. I didn't see how the management and distribution of content could become so central in a society that it would name a whole time period. But I am starting to see it now, how a low signal-to-noise ratio can kill the greatest endeavor; how the delivery of timely and to the point information can be of extreme value; and how the production of high quality content in itself can form an outstanding business plan.

I’ll say it again, the "production of high quality content in itself can form an outstanding business plan". Traditionally and historically the great content producers also had to be great content deliverers in order to survive. They had to make sure that the newspapers or books were printed and delivered to make any kind of business. Today, all of this has changed. Today, we have electronic delivery of the same content that used to make up newspapers and books, through for example the World Wide Web.

But along with the change of delivery method we as customers are losing out on some of the great and time-proven ways of accessing the content. We need to make a compromise between reading a newspaper online with all the latest events, or in paper format using news that in our fast-paced life are already old (just by a few hours but still old). This is where FeedJournal comes into play. FeedJournal serves as a content deliverer and presents information from whichever sources you want in a traditional format that was the default way of reading news for a very, very long time.

Of course, this is just one out of many of FeedJournal’s benefits over a traditional newspaper. It also empowers the user with the option of collecting multiple feeds to create a newspaper that is tailored for her own needs: with the local team’s results, the stock portfolio’s development or even personal e-mail. It gives the user the possibility to choose the deadline to be the exact moment she wants, not six hours before it will actually be read. And of course the paper size can be decided: A4, A3, letter size or why not an index card version that you can put in your Hipster-PDA? Gone are the monthly subscription fees for delivery, you only need to pay for the actual content in case your favorite news source doesn’t provide it for free on the web already.