Scholar parser
(highly experimental)
This class parses a profile page from Google Scholar for publication data and scientist stats. The page can be read directly from Google Scholar by supplying the user's profile ID, or by passing a HTML file saved from Scholar to the class.
Installation
This module requires PHP-PhantomJS which can be found at http://jonnnnyw.github.io/php-phantomjs/. For easy installation, I recommend using Composer (https://getcomposer.org/) and use the included composer.json file by executing the command
php composer.phar install
Example usage
Below is a very basic example. For a more elaborate one see the example.php file which uses a basic caching mechanism to not query Scholar with each page view request
// Create a new instance of the parser class
require_once("scholar_profile_parser.class.php");
$parser = new ScholarProfileParser();
// The profile to parse (mine in this case)
$scholar_id = "Pm3O_58AAAAJ&hl";
// Read the html from Scholar into a DOM object
$parser->read_html_from_scholar_profile($scholar_id);
// Parse publication data from the DOM
$parser->parse_publications();
// Parse stats from the DOM (H-Index, citation count, i10 index)
$parser->parse_stats();
// Print the output
$parser->print_parsed_data_raw(); //Basic output as stored in JSON
echo $parser->format_publications_in_APA(); //Formatted as HTML table
API reference
The documentation for the API can be found here.