Thursday, January 3, 2013

My data problem

First off I must apologise for this rambling post but this is more about me throwing my ideas down than having something great for everyone to read. maybe one day I'll come back to this and revise it as I get clearer on what I'm trying to achieve.

Now that I'm getting older one of the things I've realised is that I have a problem. I have too much data. I have tonnes of photos, movies, documents, books with no time to manage them let alone managing and organising anything new.

Not only do I have too much data I don't have proper places in putting all that data. I have stuff since the days of uni that have survive but they're in multiple files folders and even several machines. Also data that I have in some places are only playable in other places due to file formats compatibility etc.

One last thing. Despite all the complaining about this data. I also really REALLY want to keep it and back it up.

the problem

Rather than me rambling its usually better to clearly state what is actually the problem.
The problem is that I have: 
  • lots of data
  • no place to put it
  • no way of backing it up
  • no way to organise it
  • and too many ways to view it

the vision

This is just a list of components with their functions listed underneath. I may learn later down the track that this setup is all backwards but at least I have a start.

(note: after reviewing this post I realised I need pictures diagrams...probably not graphs)

data server

This is where all the data is stored. I hope to have some sort of RAID backup of the data as well for preservation. Some applications may live on this box that doesn't require a local interface e.g. a torrent client like utorrent. The drives here would probably be all samba directories

media frontend

so this will sit next to the big tv and most of the media (photos, music, movies) will be viewed on it. For the time being I've decided that xbmc will play the major role here. Also a tv-tuner may live here with a backend application recording media to the data server.

workstation(s)

This is just a machine where all the user applications are kept. Applications that help me while the majority of the data is kept on the server. I'm not sure if I'll have more than one of these in the future and if I do then how will I manage the users on the server =/

the plan

I'm not the first nor the last person to have this problem so its great there are a lot of solutions out there to solve these problems. However there is also a LOT of solutions out there so I'll need a plan to decide how best to choose each them.

First up this is not going to be the first system I put together so I'm going to avoid buying brand new hardware if I can help it (hard drives may be the exception). I already have too many old machines lying around not being used and mostly accumulating dust so this is my way of saving the environment.

also by forcing all these bits together the first time I hope to learn a lot and improve it in version 2.




Monday, November 19, 2012

Kaggle


My life over the last 5 years has been dominated mostly by a voracious consumption of all media ranging from books, HBO television (GoT, BRaking BAd,True Blood), video games, movies and the pointlessly amusing reddit.

So in an attempt to better myself I've taken up a few new hobbies. I'll start with kaggle because its seems more interesting than my other achievement this week (learning to sew a button)

Making data science a sport
So what is kaggle? Its a place where people gather and form teams to analyse data across multiple competitions. The teams that performs the best in each competition takes home the loot. Which happens to be $3,000,000 for the biggest open competition so far.This alone has convinced me that this won't be a completely pointless hobby. And even if I don't win anything at least the prize money shows there are many people out there that view data science as a valuable skill.

What drew me to kaggle was that despite having competitive goals in each competition, there is a strong emphasis in collaboration in the forums, and this is always a good sign for a beginner. And like most people I have an interest in data. Well I think most people like data, I only assume so because elections are always about polling numbers it seems.

The first kaggle competition I've entered in is the Titanic: Machine Learning from Disaster

Having seen the passenger list it seems that Jack Dawson's name wasn't on there. He really was a stowaway.
Having made one submission I'm ranked in the lowly 600s (out of 650). Oh well its a start and hopefully I'll improve in weeks to come.

Wish me luck.

Friday, November 16, 2012

First!

So there isn't a lot to write but I'm starting (we'll see about continuing) this blog to keep track of my interests and my personal growth over time. I haven't recorded much in my past but now is time I guess. Wish me luck.