Skip to content

Reference

Here is the Web API code or reference, providing details about the classes, methods, parameters, attributes, and each part of this application.

.
├── app
│   ├── adapters/
│   |   ├── factories/
│   |   ├── gateway/
│   |   ├── helpers/
|   |   └── presenters/
│   ├── core/
│   |   ├── common/
│   |   ├── config/
│   |   ├── data/
│   |   ├── domain/
|   |   └── usecases/
│   ├── framework/
│   |   ├── dependencies/
│   |   ├── exceptions/
│   |   ├── fastapi/
│   |   └── middleware/
│   └── utils/


SearchParams

Type validator for API Key and Keywords search params
source code core.common
app/core/common/types.py

class SearchParams(
    api_key: str = Field(),
    keywords: Keywords = Field()
)

A model validator built using Pydantic BaseModel for the API Key and Keywords parameters used in the Scopus Search API search, validating their typing and values using the Pydantic Field() function.

Note

Read more about the API Key and Keywords rules and specifications in the requirements section.

Parameter Type Description
api_key str API Key search parameter
keywords Keywords Keywords search parameter


ScopusResult

Serializer for entry field item in response JSON schema
source code core.data
app/core/data/serializers.py

class ScopusResult(
    link: str = Field(),
    url: str = Field()
    scopus_id: str = Field()
)

A model serializer built using Pydantic BaseModel for the entry items in the Scopus Search API search response, parsing them in code using the Pydantic Field() function.

Parameter Type JSON Field Description
link str @_fa Top-level navigation links
url str prism:url Content Abstract Retrieval API URI
scopus_id str dc:identifier Article Scopus ID


ScopusSearch

Serializer for Scopus Search API response JSON schema
source code core.data
app/core/data/serializers.py

class ScopusSearch(
    total_results: int = Field(),
    items_per_page: int = Field(),
    entry: list[ScopusResult] = Field()
)

A model serializer built using the Pydantic BaseModel for the Scopus Search API JSON response, parsing it in code using the Pydantic Field() function.

Before validation, it will access the search-results JSON field to flatten the hierarchy and get the actual data.

Info

Flattening is the process of transforming a nested JSON data structure into a single level of key-value pairs.

Parameter Type JSON Field Description
total_results int opensearch:totalResults Total number of articles found
items_per_page int opensearch:itemsPerPage Number of articles divided into each page
entry list[ScopusResult] entry Lists of article data with the fields specified in the search

Note

Read more about the returned JSON body and its fields.

pages_count

def pages_count() -> int

Calculates the number of pages by dividing the total results by the number of items per page, returning the smallest int using the math ceil() function.


ScopusQuotaRateLimit

Serializer for Scopus APIs responses
source code core.data
app/core/data/serializers.py

class ScopusQuotaRateLimit(
    reset: float = Field(),
    status: str = Field(),
    error_code: str = Field()
)

A model serializer built using Pydantic BaseModel for the Scopus APIs responses, parsing them in code using the Pydantic Field() function.

Before validation, it will retrieve the response headers and get the error-response response JSON field if present.

Parameter Type Response Field Description
reset float X-RateLimit-Reset Date/Time in Epoch seconds when API quota resets
status str X-ELS-Status Elsevier server/Scopus API status
error_code str error-code Elsevier server/Scopus API error code

Info

Epoch is the number of seconds that have elapsed since January 1, 1970, also known as Unix time.

reset_datetime

def reset_datetime() -> str

Convert the epoch timestamp from the quota reset header to a datetime, format it, and return it as a more understandable str of the datetime, telling when the API request quota will be reset.

quota_exceeded

def quota_exceeded() -> bool

Check the value of the response status header, returning True if it is equal to QUOTA_EXCEEDED - Quota Exceeded and False otherwise.

Note

Learn more about the API request quota limit.

rate_limit_exceeded

def rate_limit_exceeded() -> bool

Check the value of the response error code field, returning True if it is equal to RATE_LIMIT_EXCEEDED and False otherwise.

Note

Learn more about the API request throttling rate limit.


ScopusAbstract

Serializer for Scopus Abstract Retrieval API response JSON schema
source code core.data
app/core/data/serializers.py

class ScopusAbstract(
    url: str = Field(),
    scopus_id: str = Field(),
    authors: str = Field(),
    title: str = Field(),
    publication_name: str = Field(),
    abstract: str = Field(),
    date: str = Field(),
    eid: str = Field(),
    doi: str = Field(),
    volume: str = Field(),
    citations: str = Field()
)

A model serializer built using the Pydantic BaseModel for the Scopus abstracts of the articles in the JSON response, parsing them in code using the Pydantic Field() function and setting to null any fields that are not returned.

Before validation, the hierarchy will be flattened to get the actual data. First, the abstracts-retrieval-response JSON field will be accessed, then the authors field will be set from the author JSON field, taken from the authors JSON field if returned or from the dc:creator JSON field otherwise.

Info

Flattening is the process of transforming a nested JSON data structure into a single level of key-value pairs.

Additionally, the author names will be selected from the ce:indexed-name JSON field in the author data, to be concatenated and returned. Finally, the coredata JSON field will be accessed and updated with the author data before returning it.

When deserialized into dict, the date field, when not null, will be formatted as DD-MM-YYYY.

Parameter Type JSON Field Description
url str link ref=scopus Scopus article preview page URL
scopus_id str dc:identifier Article Scopus ID
authors str authors or dc:creator Complete author list or only the first author
title str dc:title Article title
publication_name str prism:publicationName Source title
abstract str dc:description Article complete abstract
date str prism:coverDate Publication date
eid str eid Article Electronic ID
doi str prism:doi Document Object Identifier
volume str prism:volume Identifier for a serial publication
citations str citedby-count Cited-by count

Note

Read more about the returned fields in the Scopus Search Views documentation.


AccessToken

Get and validate the Access Token
source code framework.dependencies
app/framework/dependencies/access_token.py

class AccessToken()(
    request: Request,
    access_token: Annotated[str | None, TokenHeader] = None
)

A route dependency that implements the __call__ method to create a callable instance that will obtain and validate the Access Token header via the FastAPI Header() function or the request.

To provide a little more security, the application will automatically generate a Token that will be passed to the application's API web page, which in turn will send it in the request header for validation.

Parameter Type Description
request Request The FastAPI Request object
access_token str or None Request Token header descriptor and validator. Default: None


QueryParams

Get and validate the Query Params
source code framework.dependencies
app/framework/dependencies/query_params.py

class QueryParams()(
    request: Request,
    api_key: Annotated[str | None, APIKeyQuery] = None,
    keywords: Annotated[Keywords | None, KeywordsQuery] = None
)

A route dependency that implements the __call__ method to create a callable instance that will obtain and validate the API Key and Keywords query parameters via the FastAPI Query() function or the request.

Parameter Type Description
request Request The FastAPI Request object
api_key str or None Request API Key Query Param descriptor and validator. Default: None
keywords Keywords or None Request Keywords Query Param descriptor and validator. Default: None

equals

def equals(api_key: str, keywords: list[str]) -> bool

Compares the instance's API Key and Keywords with another API Key and Keywords, returns True if they are equal and False otherwise.

Parameter Type Description
api_key str API Key to comparison
keywords list[str] Keywords to comparison

to_dict

def to_dict() -> dict[str, str | Keywords]

Serializes the API Key and Keywords instance attributes as a dict.


HTTPRetryHelper

Make HTTP requests with throttling and retry mechanisms
source code adapters.helpers
app/adapters/helpers/http_retry_helper.py

class HTTPRetryHelper(
    for_search: bool = None
)

An HTTP client for making requests with the following mechanisms:

  • Throttling: control the rate of data flow into a service by limiting the number of API requests a user can make in a certain period.
  • Retry: automatically retry failed operations to recover from unexpected failures and continue functioning correctly.
  • Rate Limiting: limits network traffic by controlling the number of requests that can be made within a given period of time.
  • Session: persist certain parameters and reuse the same connection across all requests.
  • Cache: temporarily stores data so that future requests for that data can be fulfilled more quickly.
Parameter Type Description
for_search bool Indicates in the log message whether the request will be directed to the Scopus Search API or not. Default: None

mount_session

def mount_session(headers: Headers) -> None

Initializes the session and mounts it by registering the cache-control connection adapter with the retry configuration.

Parameter Type Description
headers Headers The HTTP headers to send in the request

close

def close() -> None

Closes the cache-control connection adapter and session.

request

def request(url: str) -> Response

Initialize, prepare with session, send the request, and then returns the response as a Requests Response object.

Parameter Type Description
url str The URL to send the request to


URLBuilderHelper

Generate and format URLs for HTTP requests
source code adapters.helpers
app/adapters/helpers/url_builder_helper.py

class URLBuilderHelper()

A builder to generate Scopus APIs resource URLs and pagination URL.

get_search_url

def get_search_url(keywords: Keywords) -> str

Generates Scopus Search API resource URL and returns it as a str.

Parameter Type Description
keywords Keywords The keywords that will be used in the search

get_pagination_url

def get_pagination_url(page: int) -> str

Generates Scopus Search API pagination URL and returns it as a str.

Parameter Type Description
page int The page index for the start of pagination

get_article_page_url

def get_abstract_url(url: str) -> str

Generates Scopus Abstract Retrieval API resource URL and returns it as a str.

Parameter Type Description
url str The Scopus Abstract Retrieval API base resource URL


ScopusSearchAPI

Search and retrieve articles via the Scopus Search API
source code adapters.gateway
app/adapters/gateway/scopus_search_api.py

class ScopusSearchAPI(
    http_helper: HttpRetry,
    url_builder: UrlBuilder
)

First, the request headers for the Scopus API will be built with the API Key, the resource URL is built with the API Key and Keywords as search parameters, and then the articles will be searched via the Scopus Search API. Then, the response is validated, retrieving the articles if successful, or handling errors otherwise.

An error will be returned when: no articles are found, the API Key quota is exceeded, the Scopus Search API returns a HTTP status error, and when the JSON response cannot be validated.

The articles data will be validated, defaulting to null for fields that are not returned. It may use threads with the ThreadPoolExecutor and build the URL with the page index when there are multiple articles to fetch with pagination.

Parameter Type Description
http_helper HttpRetry Injects HttpRetryHelper to make the requests
url_builder UrlBuilder Injects UrlBuilderHelper to build the URLs

search_articles

def search_articles(search_params: SearchParams) -> list[ScopusResult]

Searches for articles via Scopus Search API, compiles and returns all retrieved data in a list of ScopusResult.

Parameter Type Description
search_params SearchParams Validated API Key and Keywords search parameters


ScopusAbstractRetrievalAPI

Retrieves Scopus abstracts via the Scopus Abstract Retrieval API
source code adapters.gateway
app/adapters/gateway/scopus_abstract_retrieval_api.py

class ScopusAbstractRetrievalAPI(
    http_helper: HttpRetry,
    url_builder: UrlBuilder
)

First, the request headers to the Scopus API will be built with the API Key, the resource URL is built from the base resource URL, and then the Scopus abstracts will be retrieved via the Scopus Abstracts Retrieval API. The response is then validated, retrieving the abstracts if successful or handling errors otherwise.

An error will be returned when: the API Key quota is exceeded, the Scopus Abstract Retrieval API returns an HTTP status error, and when the JSON response cannot be validated.

The abstracts data will be validated, defaulting to null for fields that are not returned. It can use threads with the ThreadPoolExecutor when there are multiple abstracts to retrieve.

Parameter Type Description
http_helper HttpRetry Injects HttpRetryHelper to make the requests
url_builder UrlBuilder Injects UrlBuilderHelper to build the URLs

retrieve_abstracts

def retrieve_abstracts(api_key: str, entry: list[ScopusResult]) -> DataFrame

Retrieves Scopus abstracts via the Scopus Abstract Retrieval API, compiles and returns all fetched data into a Pandas DataFrame.

Parameter Type Description
api_key str Validated API Key search parameter
entry list[ScopusResult] List of articles data


ArticlesSimilarityFilter

Filter articles from identical authors with similar titles
source code core.usecases
app/core/usecases/articles_similarity_filter.py

class ArticlesSimilarityFilter()

From the DataFrame containing all the article information already gathered, the authors are counted, and those that were repeated at least twice are selected. Then, from the articles of these authors, their respective titles are selected and compared using the TheFuzz ratio() function, and those whose similarity rate is at least 80% are gathered and discarded.

Note

After consideration and testing, we set the similarity ratio for the articles selection at 80%.

For all the similar articles gathered, the first one is kept and the rest are discarded. If all the authors are unique, meaning none are repeated, or no similar titles were found, it will return the same DataFrame.

filter

def filter(dataframe: DataFrame) -> DataFrame

Filters articles from the DataFrame if they are from identical authors with similar titles, and then all filtered data will be returned in a Pandas DataFrame.

Parameter Type Description
dataframe DataFrame The DataFrame containing all the gathered article information to be filtered


ScopusArticlesAggregator

Gathers, filters and compiles data from Scopus articles
source code core.usecases
app/core/usecases/scopus_articles_aggregator.py

class ScopusArticlesAggregator(
    search_api: SearchAPI,
    abstract_api: AbstractAPI,
    similarity_filter: SimilarityFilter
)

First, articles are searched via Scopus Search API using the provided search parameters, and their Scopus abstracts are retrieved via Scopus Abstract Retrieval API.

Next, articles that are exact duplicates are removed, those with the same authors and titles are also discarded, and similar articles are filtered using ArticlesSimilarityFilter.

An error is returned when no articles are found.

Parameter Type Description
search_api SearchAPI Injects ScopusSearchAPI to search and get the articles via the Scopus Search API
articles_scraper abstract_api Injects ScopusAbstractRetrievalAPI to retrieve the Scopus abstracts via the ScopusAbstractRetrievalAPI
similarity_filter SimilarityFilter Injects ArticlesSimilarityFilter to filter articles by identical authors with similar titles

get_articles

def get_articles(params: SearchParams) -> FileResponse

Gathers and filters data from Scopus articles, writes and saves all remaining articles to a CSV file, and returns them as a FastAPI FileResponse object.

Parameter Type Description
params SearchParams The validated API Key and Keywords search parameters


TemplateContextBuilder

Generates context values for template responses
source code adapters.presenters
app/adapters/presenters/template_context.py

class TemplateContextBuilder(
    request: Request
)

Compiles and builds data, such as context values, for the templates that Jinja renders, passing them and loading them into HTML templates that are returned as a Jinja2Templates TemplateResponse object.

Parameter Type Description
request Request The FastAPI Request object

get_web_app_context

def get_web_app_context() -> Context

Returns data to build the API web page response template, returning the request object, template name, and context values.

About the Context values:

Field Description
request The FastAPI Request object
version Application version. Example: 3.0.0
repository URL of the application's GitHub repository
swagger_url Swagger page URL. Default: /
token Application Token
filename CSV filename. Default: articles.csv
table_url Table web page URL. Default: /scopus-survey/api/table
search_url API URL. Default: /scopus-survey/api/search-articles
description Application description

get_table_context

def get_table_context() -> Context

Returns data to build the Table web page response template, returning the request object, template name, and context values.

About the Context values:

Field Description
request The FastAPI Request object
version Application version. Example: 3.0.0
repository URL of the application's GitHub repository
swagger_url Swagger page URL. Default: /
content Table content. List of the articles found or None if there are no articles
web_app_url Application web page URL. Default: /scopus-survey/api


ExceptionJSON

Generates JSON representation responses for exceptions
source code adapters.presenters
app/adapters/presenters/exception_json.py

class ExceptionJSON(
    request: Request,
    code: int,
    message: str,
    errors: Errors = None
)

A presenter created using FastAPI JSONResponse that generates JSON representation responses for exceptions. The error details are filtered to remove the PydanticUndefined error from Pydantic ValidationError and the Request object data is retrieved.

The datetime timestamp is set as a str in ISO format and finally all data is converted and encoded using the FastAPI jsonable_encoder() function.

Parameter Type Description
request Request The FastAPI Request object
code int HTTP status error code
message str Exception description
errors Errors Error metadata and details


Exceptions and Errors

HTTP Exceptions are models built from FastAPI HTTPException that represent HTTP error status codes sent in the response to notify the client using your application of an error. The ones implemented are 401 Unauthorized, 404 NotFound, 422 UnprocessableContent, 500 InternalError, 502 BadGateway and 504 GatewayTimeout.

Application Errors are models built from the base class Exception that indicates that an error has occurred in the core part of the application's operation/processing. The ones implemented are InterruptError for the shutdown/exit interrupt signal and ScopusAPIError for the Scopus Search API HTTP status error.

Exception Handlers are routines designed to process and respond quickly to the occurrence of exceptions/errors or specific special situations during the execution of a program, returning their JSON representation. The implemented handlers are for Starlette HTTPException, FastAPI HTTPException, RequestValidationError, ResponseValidationError, ValidationError, HTTPException, ApplicationError and Exception.

About the Exception JSON Response:

Field Type Description
success bool Result of the operation, which is a failure since it is an exception. Deafult: False
code int HTTP error status code
message str Exception/error description
request dict[str,Any] Contains some request data in a dict
errors Errors Contains some details of the exception/error in a dict. Deafult: None
timestamp str The datetime timestamp as a str in ISO format

About the request field:

Field Description
host The request client host. Default: 127.0.0.1
port The request client port. Default: 8000
method The request method
url The request URL path
headers The request headers

About ScopusAPIError error details:

Field Description
status The Scopus APIs HTTP status error code
api_error The Scopus APIs response status error description. Deafult: null
content The Scopus APIs JSON response content itself

Note

See the responses status error description in the documentation.


Middlewares

Middlewares are mechanisms built on top of the Starlette BaseHTTPMiddleware that work in the application's request-response cycle, intercepting calls and processing them. They can access and manipulate each request object before it is processed by any route handlers, and also each response object before returning it. There are three implemented.

The TraceExceptionControl middleware traces the request, reporting the client, the URL accessed, the response status code, and the processing time. It also handles any unexpected exceptions and signal-interrupt errors.

The RedirectNotFoundRoutes middleware redirects any route not found request that receives a 404 Not Found error and is not a mapped allowed route. It also handles signal-interrupt errors.

The FastAPI CORSMiddleware middleware implements and configures the CORS mechanism, allowing any origin, any credentials, any header, and only the GET method.


SignalHandler

Set signal handlers to set the shutdown event flag
source code utils
app/utils/signal_handler.py

class SignalHandler(
    for_async: bool = None
)

Create an event object, either a threading Event or asyncio Event based on the parameter value, and register your handlers for the SIGINT and SIGTERM signals using the signal() function. The handlers will catch shutdown signals and set the event flag. Then, process-based or threaded operations can be terminated gracefully.

Info

A graceful shutdown is a controlled and orderly process to perform a safe shutdown and free up resources when the application is suddenly interrupted or receives a shutdown/kill signal.

Parameter Type Description
for_async bool Indicates whether the event will be asynchronous or not. Default: None