So I have this project in my head that I want to take a crack at... but I'm not sure how to begin. I know how awesome the VBForums community is, so I'm hoping you'll forgive me if this isn't the best place to post a question like this one. This will be a project coded in .net 2010 or 2012 express, but I'm not even to that point yet. Let me explain.
If you've read any of my former posts, you'll know I'm by no means new to VB or .net, however I do usually stay closer to simpler programs and solutions to problems.
To understand the reason for this project, I think its right that I share a little about myself. I'm a younger, computer savvy, yet uneducated guy. I've wanted to get into the computer business for a while, but given lack of exprience I've been stuck in sales, specifically cell phones. Over the past two years, although I'm much more knowlegable about phones, My computer knowlege has started to stagnate. As I'm now looking to migrate to a more computer-centric career, I'm looking at different ways to increase my knowledge. I'm going to be researching a lot on computer hardware on a site like newegg.com, but that's a lot of reading, and as anyone who's programmed before would know... there's always a better way.
That said here's my goal in the form of steps:
in this list:
2. I can probably find resources for
3. I can figure out, most likley
4. is something I do for fun all the time
5. is part of #4
6. is my goal
The problem here is #1. When I display a page on newegg, the comments are generated / grabbed from another resource. I'm curious if its possible to tap that underlying resource and bypass the middleman (the generated webpage) for the sake of getting clean data (rather than parsing html with mixed scripts / xml / whatever else is powering the internet these days).
Although this specific question isn't at all about VB.net, I have no idea how the underlying structure to a website works, and wouldn't know where to begin to even find this information. I'm hoping there will be an awesome developer here who's had to do something resembling this in the past, or whom may just happen to have or know someone with knowledge on this type of thing.
Short Disclaimer: I know someone is going to mention TOU, or policies about copying / reusing data on a site.... Seeing as though I'm doing nothing other than representing the data already publically available, and not distributing it. I don't see how this would be any different than a custom coded web browser. Any discussion / comments on this is also welcome. I just want to avoid anyone accusing me of deliberately trying to get around a terms of use, like google search parsing for example... also, if newegg happens to have a publicly available API that I don't know about, but you do, that would be awesome to mention...
In the end I may end up having to parse HTML code... I've done it for other websites before, its just a pain and I'm curious if there's a better way.
Edit: As I'm searching on the subject I've found that Newegg has a JSON backend http://www.ows.newegg.com/, as described on http://www.bemasher.net/archives/1002...
so far, I've only found product information, no comment data. Still researching if this is available through this source or elsewhere.
If you've read any of my former posts, you'll know I'm by no means new to VB or .net, however I do usually stay closer to simpler programs and solutions to problems.
To understand the reason for this project, I think its right that I share a little about myself. I'm a younger, computer savvy, yet uneducated guy. I've wanted to get into the computer business for a while, but given lack of exprience I've been stuck in sales, specifically cell phones. Over the past two years, although I'm much more knowlegable about phones, My computer knowlege has started to stagnate. As I'm now looking to migrate to a more computer-centric career, I'm looking at different ways to increase my knowledge. I'm going to be researching a lot on computer hardware on a site like newegg.com, but that's a lot of reading, and as anyone who's programmed before would know... there's always a better way.
That said here's my goal in the form of steps:
- Figure out how to access neweggs comment stream
- Download this data...
- ...into some kind of data structure in .net
- parse the data for keywords and word combinations
- congregate this information into a useful information
- read and learn from this useful information
in this list:
2. I can probably find resources for
3. I can figure out, most likley
4. is something I do for fun all the time
5. is part of #4
6. is my goal
The problem here is #1. When I display a page on newegg, the comments are generated / grabbed from another resource. I'm curious if its possible to tap that underlying resource and bypass the middleman (the generated webpage) for the sake of getting clean data (rather than parsing html with mixed scripts / xml / whatever else is powering the internet these days).
Although this specific question isn't at all about VB.net, I have no idea how the underlying structure to a website works, and wouldn't know where to begin to even find this information. I'm hoping there will be an awesome developer here who's had to do something resembling this in the past, or whom may just happen to have or know someone with knowledge on this type of thing.
Short Disclaimer: I know someone is going to mention TOU, or policies about copying / reusing data on a site.... Seeing as though I'm doing nothing other than representing the data already publically available, and not distributing it. I don't see how this would be any different than a custom coded web browser. Any discussion / comments on this is also welcome. I just want to avoid anyone accusing me of deliberately trying to get around a terms of use, like google search parsing for example... also, if newegg happens to have a publicly available API that I don't know about, but you do, that would be awesome to mention...
In the end I may end up having to parse HTML code... I've done it for other websites before, its just a pain and I'm curious if there's a better way.
Edit: As I'm searching on the subject I've found that Newegg has a JSON backend http://www.ows.newegg.com/, as described on http://www.bemasher.net/archives/1002...
so far, I've only found product information, no comment data. Still researching if this is available through this source or elsewhere.