Discarded Data May Be Hiding Some Gold
The poet Kim Chun-soo once wrote: “When I called his name, he came to me and became a flower.” There is a lot of data around us that is simply abandoned and forgotten. Big data, however, attempts to bring this data back to life.
When someone is sent to an emergency room, all sorts of machines record the patient’s state. However, this precious data is not stored. It is discarded after being analyzed by a doctor in charge. If such data were stored, it could later be used by other doctors for case studies or for diagnosis and treatment when another patient is transferred.
Recently, the woman who was abducted by a murderer in Suwon called the police before she was killed. As was widely reported, the nearby base stations also received the signal, but only one recorded it – to charge the transmission fee. The other base stations discarded the data. It could have been easier to locate the woman if data from other stations could be utilized as well, perhaps saving her life.
From relentless chitchat to constantly uploaded everyday pictures to well-written research papers and extremely moderate blogs, there is plenty of diverse information on the web. Sensors are accumulating an enormous amount of information as well. Most of this abundant data around us is thrown away. The gist of big data is to collect and utilize this data as a new resource in itself.
Alberta, Canada, has been flourishing since 2003 when the province (I am Canadian) began to mine oil sand, which had been previously dismissed as containing low economic value before the international oil prices increased to over 40 dollars per barrel. Big data is oil sand in a sense – it is perhaps the last remaining resource that someone would make use of at some point. In any case, the storage prices are going down day by day. Data mining technology is improving at a rapid pace. Clouding environments allow data to be stored and searched for in a more stable way. For those reasons, it does not make a big difference now whether you discard or store the data. Sometimes you feel lucky when something you could have thrown away turns out to be useful at a later date.
From now on, it will not be easy for you to lie that you went to Jeju on a business trip. You can quickly find out whether someone is in Jeju or Seoul through the positional information system. In addition, the system locates people quite accurately – for instance, it can determine whether you are at the airport or a hotel in the city. These things are already happening. When you click on the “new tab” on your browser, it shows the most frequently visited websites. You may not even realize which site you visit most often. There is a website which displays connections between me and scholars who collaborated with me on a paper. This also happened without my knowledge. Even more thankfully, the site shows the citation of the paper. When I have a good idea, I immediately search for relevant keywords to find out if there is a similar idea. At those times I often worry if Google would combine those keywords and predict what I would think. Of course, it is a ridiculous notion. However, Google might do just that.
On August 4, 2006, AOL disclosed a compressed text file for research purpose. The file contained the record of 20 million keyword searches made by 650,000 users over the course of three months. For security reasons, AOL made sure the users could not be identified. Within a week, however, an anonymous customer number 4417749 was discovered to be Thelma Arnold – just by selecting a few of the searched keywords the real name and picture was revealed.
Combining the keywords that I have searched recently would show that I am quite obsessed with security statistics. Using unusual keywords in search engines can easily tell you that I might have some special ideas. Google could ask their task force to steal my idea. Of course, they will soon be disappointed when they come to realize what nonsense my idea was. That was just a fictitious story I wrote, allowing my imagination to wander about the upcoming age of big data. However, we will soon see this is not simply fiction. The era of big data has already begun. Some are creating added value by organizing pieces lying about, others are just watching and some don’t even realize the world is changing in front of their eyes. Is it not a problem, though, if you see oil sand and still don’t understand its value? Taking into account the oil sand, Canada has become the second largest oil reserve in the world, next to Saudi Arabia. In any case, data will pile up without limits.