Scholar Profile parser  1.0
Public Member Functions
ScholarProfileParser Class Reference

Parser of Google Scholar user profile pages. More...

Public Member Functions

 ScholarProfileParser ($scholar_user_id="", $sort_by="year")
 
 read_html_from_scholar_profile ($id, $sort_by="year")
 
 read_html_from_file ($filename)
 
 save_to_json ($filename)
 
 read_json ($filename)
 
 parse_publications ($needs_year=true)
 
 parse_stats ()
 
 print_parsed_data_raw ()
 
 format_publications_in_APA ($show_citations=true, $table_header=true)
 
 get_stats ()
 

Detailed Description

Parser of Google Scholar user profile pages.

This class parses publications and stats from Google Scholar user pages. The retrieved data can be exported to json.

Author
Daniel Schreij
Version
1.0
Date
2015

Member Function Documentation

format_publications_in_APA (   $show_citations = true,
  $table_header = true 
)

Formats the contents of $this->parsed_data["publications"] to an HTML table containing publications in APA format.

Parameters
boolean$show_citationsif true, then an extra column showing the citations of each article is added
boolean$table_headerif true, the table is given a header displaying title (in this case only for citations)
Returns
string The HTML code describing the table in which each publicaion is shown in a row
get_stats ( )

Returns the HTML currently in the parsed_data["stats"] variable

Returns
string HTML code displaying the stats in a table.
parse_publications (   $needs_year = true)

Parses publications from the $this->dom variable. Stored them in the $this->parsed_data variable wit the key 'publications'

Parameters
boolean$needs_yearIf true, publications not having a date will be ommitted (to prevent contamination of the list by things other than journal articles, books and conference posters)
Returns
array The publications parsed from the DOM
parse_stats ( )

Parses the stats of the scholar profile user (Citations, H-Index, and i10 index)

Returns
string The HTML in which the stats data is displayed in a table
print_parsed_data_raw ( )

Prints the data currently in the parsed_data variable (convenience function)

Returns
void
read_html_from_file (   $filename)

Generates a DOM object by reading the html from the specified file. The function returns the plain HTML. The DOM object can be accessed by $<your_variable>->dom.

Parameters
string$filenameThe file to read
Returns
string The HTML contained by the file
read_html_from_scholar_profile (   $id,
  $sort_by = "year" 
)

Sets the DOM object of the class to the html directly retrieved from Google Scholar. The function returns the plain HTML. The DOM object can be accessed by $<your_variable>->dom.

Parameters
string$idThe ID string of the Google Scholar userprofile to parse
string$sort_byThe variable to sort the parsed data by ("year" or false->number of citations)
Returns
string The retrieved HTML. The Dom object is automatically set by this funcion overwriting its previous contents
read_json (   $filename)

Read data from a json file and store it in the parsed_data variable

Parameters
string$filenameThe path to the file
Returns
void
save_to_json (   $filename)

Saves the current DOM object to a .json file

Parameters
string$filenameThe destination file path and name to save to.
Returns
void
ScholarProfileParser (   $scholar_user_id = "",
  $sort_by = "year" 
)

Constructor

Parameters
string$scholar_user_idThe ID string with which to identify the Scholar User Profile (e.g. Pm3O_58AAAAJ)
string$sort_byThe variable to order the publications by (default='year'; if 'false' they are orderd by citation count (Descending))

The documentation for this class was generated from the following file: